This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3352: HTTP Server issue

Part Number: AM3352

Dear TI support team,

I have a problem using the TI RTOS HTTP server.
The problem occurs with our custom board (AM3352) as well as with the BeagleBone Black.

I created a new TI RTOS project (toolchain details below) for the BeagleBone Black board.
Then I setup the TCP/IP settings and the HTTP server using the XGCONF tool.
Now I added the required website files, NDK hooks (add-/remove website file) and updated the buffer sizes suitable for networking and the heap memory.

The application starts as well as the TCP/IP stack and the HTTP server.
As long as I only send ICMP Echo Requests ('Ping') - also with bigger data sizes (e.g. 2 KiB) - the TCP/IP stack seems to be stable.

But when I start to send HTTP requests via web browser (once or continuously), the HTTP server and the whole TCP/IP stack stop working after a random time (ranges from 3 min to 20 min).
In this state, the device no longer responds to any network traffic (neither HTTP nor ICMP Echo) and the console starts to output a 'Retransmit Timout' (see a short log below).

The application and the DSP BIOS continue to work, other tasks (in this case only a 1-second UART output task) work as expected.
So the TI RTOS does not seem to crash or stop.

Even if I stop sending further HTTP or ICMP Echo requests, additional 'Retransmit Timeout' messages appear.


To reduce the number of possible error reasons, I used a very simple website for this test application:
* Index HTML only
* no CGI
* no auth-protected websites
* no auth hook function
* Browser reload time 10 s (in case of continuous mode)

Network setup:
Sitara AM3352: 192.168.1.20 (static)
PC: 192.168.1.10 (static)


Tools used:
* CCS 9.0.0.00018
* PDK 1.0.15
* NDK 3.60.0.13
* EDMA3 2.12.5
* SYS/BIOS 6.75.2.00
* XDCtools 3.50.8.24
* Compiler: GNU v7.2.1 Linearo

Console log output:

enter main()
Start EMAC Init
enter taskFxn()
00000.000 Network Added:
00000.000 If-1:192.168.1.20

00000.000 Service Status: HTTP     : Enabled  :          : 000

01731.800 TcpTimeoutRexmt: Retransmit Timeout
01732.100 TcpTimeoutRexmt: Retransmit Timeout
01741.400 TcpTimeoutRexmt: Retransmit Timeout
01741.600 TcpTimeoutRexmt: Retransmit Timeout
01750.200 TcpTimeoutRexmt: Retransmit Timeout
01750.400 TcpTimeoutRexmt: Retransmit Timeout
01755.800 TcpTimeoutRexmt: Retransmit Timeout
01756.100 TcpTimeoutRexmt: Retransmit Timeout
01759.000 TcpTimeoutRexmt: Retransmit Timeout
01759.200 TcpTimeoutRexmt: Retransmit Timeout
01765.400 TcpTimeoutRexmt: Retransmit Timeout
01765.600 TcpTimeoutRexmt: Retransmit Timeout
01774.200 TcpTimeoutRexmt: Retransmit Timeout
01774.400 TcpTimeoutRexmt: Retransmit Timeout
01783.000 TcpTimeoutRexmt: Retransmit Timeout
01783.200 TcpTimeoutRexmt: Retransmit Timeout
01803.800 TcpTimeoutRexmt: Retransmit Timeout
01804.100 TcpTimeoutRexmt: Retransmit Timeout

You will find the BeagleBone Black CCS project attached to this message.

Kind regards,
Markus

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/Test_5F00_HTTP_5F00_Server.7z

  • Hi,

    Thanks for the information! In the.cfg file, there is setup of: Tcp.transmitBufSize = 16384. Please try to change this to 32768 or 65536 to see if it helps. Also, are you comfortable to add some debug code into AM335x NIMU driver? If yes, in pdk_am335x_1_0_15\packages\ti\transport\ndk\nimu\packages\ti\transport\ndk\nimu\src\v4\nimu_eth.c,

    add a global counter packetDropped (initialize to 0) to track if any packet dropped in the EMAC sending level?

    sendResult = emac_send(0, &csl_send_pkt);

    if(sendResult)
    {
    packetDropped++;

    NIMU_drv_log1("CPSW_sendPacket() returned error %08x\n",i);

    /* Free the packet as the packet did not go on the wire*/
    PBM_free( (PBM_Handle)csl_send_pkt.AppPrivate );
    }

    After the above code change, you need to rebuild the NIMU driver to make it effective, using command like "pdk_am335x_1_0_15\packages>gmake nimu" at the PDK installation top level. Then re-build your test application to link with the new NIMU library.

    Regards, Eric

  • Hello Eric,

    thank you for the fast response.

    I have changed the TCP transmit buffer size to 65536 and implemented the dropped packet buffer counter as you have suggested.
    Unfortunately, this doesn't seem to make a difference.

    I did several tests with the above modifications on the BBB hardware:

    Test #1:
    * HTTP reload time 10s
    * ICMP Echo: 1x 32 B, 1x 2048 B

     

    00266.200 TcpTimeoutRexmt: Retransmit Timeout

    00266.500 TcpTimeoutRexmt: Retransmit Timeout

    00275.800 TcpTimeoutRexmt: Retransmit Timeout

    00276.100 TcpTimeoutRexmt: Retransmit Timeout

    00284.500 TcpTimeoutRexmt: Retransmit Timeout

    00284.700 TcpTimeoutRexmt: Retransmit Timeout

    00290.200 TcpTimeoutRexmt: Retransmit Timeout

    00290.500 TcpTimeoutRexmt: Retransmit Timeout

     

    Packet Drop Counter = 0

     

     Test #2:

    * same as Test #1

     

    00268.400 TcpTimeoutRexmt: Retransmit Timeout

    00268.700 TcpTimeoutRexmt: Retransmit Timeout

    00278.000 TcpTimeoutRexmt: Retransmit Timeout

    00278.300 TcpTimeoutRexmt: Retransmit Timeout

    00292.400 TcpTimeoutRexmt: Retransmit Timeout

    00292.700 TcpTimeoutRexmt: Retransmit Timeout

    00302.000 TcpTimeoutRexmt: Retransmit Timeout

    00302.300 TcpTimeoutRexmt: Retransmit Timeout

     

    Packet Drop Counter = 0

     

    Test #3:

    * same as Test #2 but HTTP reload interval increased to 20 s

     

    00270.400 TcpTimeoutRexmt: Retransmit Timeout

    00270.600 TcpTimeoutRexmt: Retransmit Timeout

    00294.400 TcpTimeoutRexmt: Retransmit Timeout

    00294.600 TcpTimeoutRexmt: Retransmit Timeout

    00342.400 TcpTimeoutRexmt: Retransmit Timeout

    00342.600 TcpTimeoutRexmt: Retransmit Timeout

    00402.400 TcpTimeoutRexmt: Retransmit Timeout

    00402.600 TcpTimeoutRexmt: Retransmit Timeout

     

    Packet Drop Counter = 0

    In this 3 test cases it looks like the problem always starts at the same time (compare the timestamps).

    Kind regards,
    Markus

  • Hi Markus,

    I'm working on getting a BBB set up so that I can reproduce the issue you're seeing. In the mean time:

    1. It seems that your network topology is that your PC and BBB are connected on a home router network? Please confirm.
    2. Were you able to get a Wireshark capture of this problem?

    Steve

  • Hello Steve,

    the computer and the BBB are connected directly with a network cable.
    There is no other device in this network.
    The computer has (static) IP 192.168.1.10, the BBB has (static) IP 192.168.1.20.

    I have reproduced the issue and did a wireshark capture.
    It is attached to this message.

    The last successful HTTP response packet from the BBB is packet #587 (timestamp 252).
    The next attempt (timestamp 262, packet #588) does not receive a response.

    At this point the BBB also does not respond to an ICMP Echo (Ping) anymore.

    Kind regards,
    Markus

    TI_HTTP_Server_Failure.pcapng.gz

  • Markus,

    Getting a hold of the hardware needed to reproduce your issue is proving to be a challenge. In the meantime, I'm wondering if you can try something else on your end...

    Can you please try to open the ROV tool to help debug this issue? (Please halt the app at the problem point [when pings are not being responded to] before opening ROV.)

    You can find ROV in CCS under the "Tools > ROV" menu option (See here for some more details on ROV).

    Once you have ROV open, please open the Task view (you want the "detailed view"). What do you see in the Task list? (you might also check the call stack tab to see where each thread is at in the code).

    (Screen shots of what you see would be welcome, too)

    Steve

     

  • Hello Steven,

    I reproduced the issue, stopped the target and did 2 screenshots of the ROV Task data.

    Detailed tab:

    Call stacks tab:

    Kind regards,
    Markus

  • Hi Markus,

    Thanks for those screen shots. Everything is blocked (as you can see), with only the idle task running.

    So, there's no "energy" coming in to get things moving. I'm wondering if you are even getting any RX interrupts from the EMAC?

    When the problem happens, can you try putting a break point at the ISR functions? I don't know this driver very well, but I think it should be these functions (in the same cpsw_nimu_eth.c Iding pointed you to above).

    nimu_rx_pkt_cb()

    (hopefulling lding can confirm).

    So, maybe you should restart the app, set the b/p, then run and verify that you hit it for RX traffic.

    After you verify, unset and reproduce the issue. Then set the b/p again. Do you still hit the b/p?

    Steve

  • Did this get resolved?

    [10/22/19 Update: Marking this as TI Thinks Resolved due to no activity from original poster.]

  • Hello Todd,

    no, this did not get resolved.

    WIth all due respect but

    * I provided a fully functional CCS project example
    * the example uses only standard elements (IP, TCP, HTTP Server) from TI RTOS with only the minumum neccesary additional code additions (providing "index.html" for webserver, MAC/PHY init)
    * I used a evaluation board which the SDK supports (BBB)
    * one of your (very friendly and helpful) staff told me, they will investigate this problem by retrying the uploaded CCS project on a BBB at your site

    So, how can this be resolved?
    From my side, it looks like a bug in the HTTP server software which also causes the TCP/IP stack to crash after a few minutes - with a non-custom hardware but the "verified" BBB.

    Regarding the threads:

    If the TI HTTP server and the TCP/IP stack run (=are functional at the beginning), I have nearly the same thread states - all but the "Idle" thread are in "blocked" state.
    Maybe this ocurrs because the debugger (a Spectrum Digital XDS220 ISO btw) stops the target?

    Thank you,
    Markus

  • Hi Markus,

    Unfortunately we don't have a "Closing for no activity" button, so we use the "Ti Thinks Resolved". I know, not ideal. You did the right thing though and rejected it:) 

    I'll let Steve follow-up with the issue now.

    Todd

  • Hi Markus,

    Were you able to try the test of the EMAC ISR break points?

    Thanks,

    Steve

    When the problem happens, can you try putting a break point at the ISR functions? I don't know this driver very well, but I think it should be these functions (in the same cpsw_nimu_eth.c Iding pointed you to above).

    nimu_rx_pkt_cb()

    (hopefulling lding can confirm).

    So, maybe you should restart the app, set the b/p, then run and verify that you hit it for RX traffic.

    After you verify, unset and reproduce the issue. Then set the b/p again. Do you still hit the b/p?

  • Hello Steven,

    I was able to set a breakpoint in this function.
    The breakpoint is hit continously in working condition.

    I reproduced the issue, stopped the target and set the breakpoint.
    If I send traffic (e.g. ICMP Echo), the breakpoint is still hit.

    I resumed (CCS F8 key or the "play symbol") the target several times hitting the breakpoint several times.
    But the more often I resumed the target, the longer it takes to reach the breakpoint again.

    After some more "resumes" the breakpoint was not hit anmore regardless if I am "pinging" the AM3352 or not.

    Event if I don't have active traffic (checked by WireShark), the "TcpTimeoutRexmt: Retransmit Timeout" messages continue to appear in  the console.

    For verification: I have set the breakpoint at the following location:

    Regards,
    Markus

  • Dear Todd et al.,

    w/o the intention of being demanding, I kindly ask you and the whole support team for a cause analysis of the networking issue raised by Markus.

    Our S/W architecture decision towards TI-RTOS w/ TI networking stack was highly influenced by the assumption that these setup will run robust and will be stable because of the many TI users worldwide which very likely base their application on top of the same basic setup.

    Since the malfunction was demonstrated using the well-known and widely-used BBB evaluation board w/ only a very limited custom application (i.e. minimalistic index web page) on top, the likelihood that the malfunction is caused by the application is extremely low.

    We are facing the situation that it seems we can't offer a stable webserver-based UI for an order which will become due next month.

    Therefore, I kindly ask you and your team to streamline efforts and support us in this challenging and time-sensitive "hot" project phase.

    Thanks in advance.

    BR David

  • David,

    We are trying to reproduce it. Due to some vacations, we were a little backed up. We should have a better idea of what is going on early next week.

    Todd

  • Hi Markus,

    I was able to reproduce the problem on BBB with Processor SDK RTOS 6.0.0. The issue you described in your post happened within 20 mins.

    After I did the following change in C:\ti_am3_600\ndk_3_60_00_13\packages\ti\ndk\stack\route\rtable.c:

    Change

    static uint32_t _RtNoTimer = 0;    /* Set to disable timer */

    To
    static uint32_t _RtNoTimer = 1;    /* Set to disable timer */

    and rebuilt the NDK using ndk.mak & the Test_HTTP_Server project, then the problem went away.

    Can you try this one your side and let us know the result?

    Ming

  • Hello Ming Wei,

    I tested your solution for several hours on a BBB and our custom hardware and it looks like the problem is gone.
    No stack crashes or "retransmission errors" so far.

    This also solves the problem not only for the TI HTTP server but also for a TCP/IP server (sockets).

    Thank you very much,
    Markus