• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Embedded Software » StarterWare » StarterWare forum » lwIP UDP problem in 2.00.00.05 on the BeagleBone.
Share
StarterWare
  • Forum
Options
  • Subscribe via RSS

lwIP UDP problem in 2.00.00.05 on the BeagleBone.

lwIP UDP problem in 2.00.00.05 on the BeagleBone.

This question is answered
Jay Larson
Posted by Jay Larson
on Mar 14 2012 12:41 PM
Prodigy220 points

Hi,

I'm having problems with iwIP as supplied in StarterWare 2.00.00.05 for the AM335x and targeted to the BeagleBone. 

 

I made a very simple change to the example program "enetEcho" to add a periodic broadcast of a UDP packet every 1/2 second. 

Here is the code segment with what I added: (in bold)

   

..... SNIP ......

    /* Initialize the sample httpd server. */
    echo_init();

    struct udp_pcb* pcb = udp_new();
    struct pbuf *p;
    udp_bind(pcb, IP_ADDR_ANY, 1235);
    char msg[80];
    int packet = 0;

    /* Loop forever.  All the work is done in interrupt handlers. */
    while(1)
    {

     packet++;
     p = pbuf_alloc(PBUF_TRANSPORT, sizeof(msg), PBUF_RAM);    // allocate a pbuf, (ref=1)
     if (!p || 1!=p->ref)
     {
      UARTPuts("pbuf_alloc failed!\n", -1);
      while (1) ;
     }
     sprintf(msg, "test packet %d", packet);
     memcpy(p->payload, msg, strlen(msg));
     UARTprintf("packet %d, PBuf at 0x%08X, ref = %d sending...", packet, p, p->ref);
     udp_sendto(pcb, p, IP_ADDR_BROADCAST, 1234);
     delay(500); // half second delay - should have plenty of time to send
     UARTprintf("ref is now %d \n", p->ref);
     pbuf_free(p); // ref should be zero after we free.
    }
}

   

This works great for about 30 packets or so.  Then suddenly they stop being sent.  Below is a typical output:

 

StarterWare AM335x Boot Loader
Copying application image from MMC/SD card to RAM
Acquiring IP Adress...
EVM IP Address Assigned: 192.168.9.106packet 1, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 2, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 3, PBuf at 0x80018060, ref = 1 sending...ref is now 1
.... and so on....

packet 21, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 22, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 23, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 24, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 25, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 26, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 27, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 28, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 29, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 30, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 31, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 32, PBuf at 0x80018060, ref = 1 sending...ref is now 1
packet 33, PBuf at 0x80018060, ref = 1 sending...ref is now 2
packet 34, PBuf at 0x80018100, ref = 1 sending...ref is now 2
packet 35, PBuf at 0x800181a0, ref = 1 sending...ref is now 2
packet 36, PBuf at 0x80018240, ref = 1 sending...ref is now 2

I can monitor the packets on the wire, and in this case, the last one sent was number 32.  This is consistient with the reference count not being decremented on packet 33. Sometimes it will make it to 50 packets before the stack dies, but never much longer. 

Further investigation revealed that the interrupt handler that should be called after transmission stops being called after so many packets... But why?  Any Ideas would be very welcome!  I hope I haven't made some silly assumption on the use of lwIP under StarterWare.

Cheers,

-Jay

StarterWare beaglebone LwIP am3359
Report Abuse
  • Reply
You have posted to a forum that requires a moderator to approve posts before they are publicly available.
All Replies
  • Jay Larson
    Posted by Jay Larson
    on Mar 16 2012 11:24 AM
    Prodigy220 points

    Still trying to figure this one out...  I'm stumped.

    I have learned much more about the 'sitaraware' implementation of the lwIP driver.  It seems that everything is intended to be done on an IRQ.  This works fine for some things, like extreamly quick responses to a network client's request.  But this approach has its disadvantages, and I think that is what I'm up against. These disadvantages are:

    1) Servicing other interrupts with very low latency requirements: Because Starterware is limited to one active IRQ at a time, when servicing the network my other important interrupts can't be served.  Yes, the network servicing is fast, but is it fast enough always?

    2) initiatiating UDP traffic:  lwIP is non-reentrant.  So If I send a UDP packet from the forground (ie directly from main) lwIP may get interrupted by itself and be reentered, leading instability.  The UDP packet sending needs to be triggered by an external event unrelated to the network.

    So, in an effort to try to at least work around difficulty number 2, I disabled interrupts whenever main called into lwIP.  I would have expected this fix the problem, but it does not!  The lwIP stack still stops sending UDP packets after a while, I can't explain this.

    Is there anyone out there that has worked with lwIP inside StarterWare that would be willing to share some insites?

     

    Cheers,

    -Jay

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Jay Larson
    Posted by Jay Larson
    on Mar 20 2012 11:47 AM
    Prodigy220 points

    ... Not much help here. 

    I think that the problem is more serious.  Does the lwIP implementation in the enetEcho starterware app have a serious problem?  Even unmodified, enetEcho will crash on its own! 

    Steps to reproduce:

    1. convert the original enetEcho.bin (from C:\ti\AM335X_StarterWare_02_00_00_05\binary\armv7a\cgttms470_ccs\am335x\beaglebone\enet_echo) to app with the tiimage utility
    2. copy this to the starterware bootable SD card
    3. insert the SD card back into the beaglebone.
    4. While monitoring the serial port (through the usb connection provided), restart the board using the reset button.
    5. It should boot, copy the app image to memory, and start running.  It should give you the IP address.
    6. Make a raw tcp connection to that IP address, port 2000.  I use putty...
    7. in a windows command prompt (for example) start a contineous ping to the board using the same address (eg ping 192.168.1.89 -t)
    8. back in the tcp console window, type in a few lines.  (it usually takes about 20-50 lines for me).  Eventually, the echo's will stop comming back, and then the connection will timeout.  Soon after, the ping will also stop responding.  Reconnection is also impossible. 
    9. Reset the board, and it will work again... for a while.

    Can someone else reproduce this, Please?  Am I looking at a hardware failure or flaw in the ethernet driver provided in StarterWare?  I've used this board with other software loaded on it (using ethernet) and not had any problems, so I'm inclided to think that it is not my hardware.  But independant verification would help my sanity!

    Cheers,

    -Jay

     

    PS. Since posting this, I have not reliably been able to reproduce the issue?!  I Will continue to test to learn more.

     

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Sujith KV
    Posted by Sujith KV
    on Mar 20 2012 23:14 PM
    Intellectual2505 points

    Hi Jay,

    Sorry for the delayed reply. Also, thanks for the inputs.

    The issue you are facing with enet_echo application might be a software issue as you pointed out.

    We will try to reproduce the issues here and will see whats happening.

     [

    From your previous post :

    Still trying to figure this one out...  I'm stumped.

    I have learned much more about the 'sitaraware' implementation of the lwIP driver.  It seems that everything is intended to be done on an IRQ. 

    Yes. You are correct. Everything is done in TX/RX interrupt handlers.

    This works fine for some things, like extreamly quick responses to a network client's request.  But this approach has its disadvantages, and I think that is what I'm up against. These disadvantages are:

    1) Servicing other interrupts with very low latency requirements: Because Starterware is limited to one active IRQ at a time, when servicing the network my other important interrupts can't be served.  Yes, the network servicing is fast, but is it fast enough always?

    StarterWare doesnt have any scheduler of its own. So it expects everything to be handled inside ISRs. But it will increase the latency of pending IRQs if any ISR is being serviced.

    Anyway, StarterWare will come with an optional prioritized interrupt handler support, which support IRQ preemption. The support may come in the next StarterWare release itself.

    2) initiatiating UDP traffic:  lwIP is non-reentrant.  So If I send a UDP packet from the forground (ie directly from main) lwIP may get interrupted by itself and be reentered, leading instability.  The UDP packet sending needs to be triggered by an external event unrelated to the network.

    So, in an effort to try to at least work around difficulty number 2, I disabled interrupts whenever main called into lwIP.  I would have expected this fix the problem, but it does not!  The lwIP stack still stops sending UDP packets after a while, I can't explain this.

    We will try this and see what is happening.

     ]

    Cheers,

    Sujith.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Jay Larson
    Posted by Jay Larson
    on Mar 21 2012 03:49 AM
    Prodigy220 points
    enetEcho.c

    Thanks Sujith!

     

    I've attached my latest test version of the enetEcho.c file which demonstrates the problem on my system after about 30-50 UDP packets sent. I tried it again this AM, running it through the debugger.  It disables the irqs around lwip access.  Should be simple reproduce. Let me know if any other files are required.

    Regarding the prioritizing of the IRQs, this would be a great addition in my view. 

     

    Cheers,

    -Jay

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Jay Larson
    Posted by Jay Larson
    on Mar 27 2012 03:54 AM
    Prodigy220 points

    Has anyone been able to (or tried) to reproduce this with the attached file?

    Cheers,

    -Jay

    AM335x StarterWare lwIP
    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Sujith KV
    Posted by Sujith KV
    on Mar 27 2012 03:58 AM
    Intellectual2505 points

    Hi Jay,

    When we tried to reproduce the issue, We got it passing till packet 215, (below is the message printed on the UART console, only last few lines)

    .......

    .......

    packet 214, PBuf at 0x80027684, ref = 1 sending...ref is now 2

    packet 215, PBuf at 0x80027724, ref = 1 sending...ref is now 2

    pbuf_alloc failed!

     

     Regards,

     Sujith

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Jay Larson
    Posted by Jay Larson
    on Mar 27 2012 05:08 AM
    Prodigy220 points

    Hi Sujith,


    Then you have reproduced the issue.  The pbuf_alloc failing is a symptom of the issue...  If you had monitored the UDP packets (with wireshark or something) I expect that you would have seen that only packets up to number 40 or so were sent on the wire. 

    pbuf_alloc fails because there are no pbufs left.  There are no pbufs left becuase many pbufs (containing UDP packets) have not been sent. The reference count of pbufs is decremented only after they are acknowledged as being sent in the ISR.  It seems that these UDP packets are not sent after some time.  It also seems that the interrupt in the CPSW_CPDMA module fails to fire after some time. 

    I don't know why the CPSW is getting stuck.  I've traced into it, and it all seems correct, yet if fails.  I believe that the root cause of the problem is a problem in the sitaraif.c port which is responsible for the IRQ ... and is part of the starterware distribution. 

    So, I guess it is good news that you have reproduced the issue, the bad news is that we seem no closer to the solution. Is there anyone here familiar with the TI code in third-party/lwip-1.3.2/ports ?

    Cheers,

    -Jay

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Sujith KV
    Posted by Sujith KV
    on Mar 27 2012 23:30 PM
    Intellectual2505 points

    Hi Jay,

    I will try to help you.

    But as of now, I am not getting what might be the root cause for this issue :(.. But here we are calling the pbuf_free twice.. once after the pbuf is sent, and once in the main loop. But I am not sure, if this will cause any issue or not. I believe you might have tried without pbuf_free in the main loop.

    Could you please check if the Tx buffer descriptors are written properly or not ? (just to make sure if the link broke somewhere and the CPSW stopped sending)

    Regards,

    Sujith.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Jay Larson
    Posted by Jay Larson
    on Mar 28 2012 11:43 AM
    Prodigy220 points

    Thank you for your help Sujith, let's see if we can figure this out.

    When I inspect the code in driver sitaraif.c I see that it increments the pbuf ref counter before sending the data.  It then decrements the reference counter when the interrupt notifies that it has been sent.  This is all good and proper, and seems to work fine as long as the interrupt keeps firing... 

    So, this should be what we have in our loop:

    1. We allocate a pbuf    (reference count = 1)
    2. We send the pbuf to the lwIP stack
    3. Deep in the stack, the TI written driver increments the reference and gets the pbuf ready to send by queing it in the CPSW  (reference count = 2)
    4. ... AM3359 sends (or should send) the queued pbuf and after it is sent fires the IRQ
    5. IRQ fires and sitaraif.c driver ISR is called, which deques the pbub and decrements the reference (reference count = 1)
    6. ... Our forground process spins for 1/2 a sec.
    7. forground frees the pbuf (reference count = 0, assuming IRQ is fired!)

    What seems to be happening in our systems is that the IRQ stops firing after sending a few packets.  So step 5 is skipped.  Once this happens, the pbufs stop being freed. 

    I think that it is correct for our loop to call pbuf_free after we send.  Otherwise, there would be a pbuf memory leak, right?  pbufs are not actually available for further allocation until pbuf_free causes the reference count to be 0.

    But all this pbuf analysis is a symptom of the main problem:  the TI driver sitaraif.c stops sending packets after a while.  I don't think that there is a problem with either the test loop code or the lwIP code.  This test should be able to send UDP packets every 1/2 second forever, right? 

    I have checked with wireshark that the packets are sent on the wire correctly.  They are correct right up until the packet 30 or so.  After that, no more packets are sent.  The first packet that isn't sent also fails to generate an IRQ.

    I guess I could start tearing appart the tx buffer descriptors. I'm not all that familiar with how the AM335x ethernet hardware works, but I can try.  I was hoping that by using TI's driver along with lwIP, that I wounldn't have too start debugging the starterware code.   

    Cheers,

    -Jay 

     

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Sujith KV
    Posted by Sujith KV
    on Mar 29 2012 05:10 AM
    Intellectual2505 points

    Hi Jay,

    Thanks for your detailed analysis. As you told, pbuf_free is required in main loop. But I wanted to share one observation with you. I know you are testing on beaglebone, and we could reproduce the issue here in Beaglebone. But the same thing when tested on EVM, the packet sending never fails. Pasted below are the messages printed for packets after 1730

    …

    …

    packet 1730, PBuf at 0x800140d8, ref = 1 sending...ref is now 1

    packet 1731, PBuf at 0x800140d8, ref = 1 sending...ref is now 1

    packet 1732, PBuf at 0x800140d8, ref = 1 sending...ref is now 1

    packet 1733, PBuf at 0x800140d8, ref = 1 sending...ref is now 1

    …

    …

    But at the first look I dont get why it causes a problem in beaglebone, since the code being executed in /ports/am335x is the same !

    Do you have an AM335x EVM to test ? if so could you please check if you are facing the same issue with EVM also ?

    Regards,

    Sujith.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Jay Larson
    Posted by Jay Larson
    on Mar 29 2012 05:36 AM
    Prodigy220 points

    Hey Sujith,

     

    That is VERY interesting that it doesn't fail on the AM335x EVM. Could it be a hardware issue or some other library code that is different between the EVM and beagle bone?  I guess we should focus on the differences between the two.  I would guess that the executable for one will not execute on the other... 

    btw, I'm using a Rev A3 BeagleBone.

    I'll start looking at the differences and let you know if I find anything.  I don't have the EVM, but I will look into getting one - but expect that I won't be able to justify the expense unless we know that the bone design is flawed. 

     

    Thanks again Sujith for looking into this.  Its really encouraging to know that progress is being made! 

     

    Cheers,

    -Jay

     

     

     

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Sujith KV
    Posted by Sujith KV
    on Mar 30 2012 07:04 AM
    Intellectual2505 points

    Hi Jay,

    Sorry that I could not get a chance to look into this issue in detail. However, the difference between Beaglebone and EVM is that  Beaglebone uses MII interface and EVM uses RGMII interface. But this should not matter for such an error condition since at least some packets are being sent, and other examples are working fine.

    Or, it may be some timing issues causing errors which are not handled in the code ?

    Cheers,

    Sujith.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Jay Larson
    Posted by Jay Larson
    on Apr 10 2012 05:09 AM
    Prodigy220 points

    Hi Sujith,

    I've tested UDP code on StarterWare 2.00.00.06 and the same problem occurs. 

    I want to expand the sample code to print registers and structures before and after the apparent failure in order to try to diagnose what is happening.  Can you suggest which of these would be most useful?

    Cheers,

    -Jay 

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Sujith KV
    Posted by Sujith KV
    on Apr 12 2012 02:59 AM
    Intellectual2505 points

    Hello Jay, 

    StarterWare 02.00.00.06 was aimed at adding other features (no fixes for ethernet), as you might have already got to know from the release notes. The next StarterWare release may have enhancements for ethernet.

    I believe, printing the CPPI RAM contents for TX buffer descriptors might help. From the values of structure txch, we can at least know, where it stopped sending. 

    I will also try to see what could cause this issue and let you know.

    Btw, in /drivers/phy.h, in PhyAutoNegStatusGet() API, please replace the line   " if(PHY_AUTONEG_INCOMPLETE == (data & (PHY_AUTONEG_STATUS)))" with "if(PHY_AUTONEG_COMPLETE == (data & (PHY_AUTONEG_STATUS)))". This is a bug in phy.c, which can cause sometimes link detection failure. However, this will not affect the scenario we are disussing, but wanted to let you know.

    Regards,

    sujith.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
  • Sujith KV
    Posted by Sujith KV
    on Apr 13 2012 00:29 AM
    Intellectual2505 points

    HI Jay,

    I had a look at the example and added a few lines to print HDR and CP and wanted to keep you updated.

    The teraterm log is as given below.

    ....

    packet 15, PBuf at 0x80014260, ref = 1 sending...
    EOQ reached. HDR value:0x4A102154
    Tx ISR. CP value:0x4A102154ref is now 1
    packet 16, PBuf at 0x80014260, ref = 1 sending...
    EOQ reached. HDR value:0x4A102168
    Tx ISR. CP value:0x4A102168ref is now 1
    packet 17, PBuf at 0x80014260, ref = 1 sending...
    EOQ reached. HDR value:0x4A10217Cref is now 2
    packet 18, PBuf at 0x80014300, ref = 1 sending...ref is now 2
    packet 19, PBuf at 0x800143a0, ref = 1 sending...ref is now 2

    ....

    Surprisingly, I could not see any mistake in the execution and the code, that is, for the packet which is failed to send also, EOQ is reached and the HDR is written proper buffer descriptor address. Yet, the DMA engine fails to send the packet out! I checked the memory values for HDR and CP and are 0x4A10217C and 0x4A102168 repsectively.

    Newly added lines for printing in sitaraif_tx_inthandler:

      UARTPuts("\n\rTx ISR. CP value:", -1);
        UARTPutHexNum(curr_bd);
        /* Acknowledge the CPSW and free the corresponding pbuf */
        CPSWCPDMATxCPWrite(sitaraif->cpsw_cpdma_base, 0, (u32_t)curr_bd);

    Newly added lines for printing in

        if(curr_bd->flags_pktlen & CPDMA_BUF_DESC_EOQ) {
          UARTPuts("\n\rEOQ reached. HDR value:", -1);
          UARTPutHexNum(active_head);

     

    Regards,

    Sujith.

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
12
TI E2E™ Community
  • Support Forums
  • Blogs
  • Videos
  • Groups
  • Site Support & Feedback
  • Settings
TI E2E™ Community Groups
  • TI University Program
  • Make the Switch
  • Microcontroller Projects
  • Motor Drive & Control
Other Communities
  • Deyisupport
  • Designsomething.org
  • beagleboard.org
  • TI on Element 14
  • TI on TechXchangeSM
Other Technical & Support Resources
  • WEBENCH® Design Center
  • Product Information Centers
  • Technical Documents
  • TI Design Network
  • TI Technical Articles
  • TI Training

All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
embedded processors, along with software, tools and the industry’s largest sales/support staff.

© Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
Trademarks | Privacy Policy | Terms of Use