This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Ping delays get longer then drop and restart

Other Parts Discussed in Thread: AM3352

Hi,

     We are running SYS/BIOS with NDK on our production board using the CPSW interface. Our current software versions:

Processor AM3352
Code Composer Version: 6.0.1.00040
Compiler TI v5.1.6
SYS/BIOS 6.40.3.39
XDCTools 3.30.3.47_core
NDK 2.24.0.11

     We currently are running a fixed IP address with four TCP sockets being created and serviced. The PC application uses the four sockets for different data and there is also a command window running a 'ping' command on the PC. The pings are 1418 bytes in size and have a good response for awhile until it hits some type of issue. The ping response times will increase from <1msec to approx 40msec and then miss a single ping before returning to a ping response of under <1msec. This anomaly happens regardless of whether the PC application is connected or not. If we run a default ping size of 32bytes this issue does not happen. Is there NDK debug information or configuration that could help pinpoint what is going on?

Thanks,

     John C. 

Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=47ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=46ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=44ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=43ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=42ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=41ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=39ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=37ms TTL=255
Reply from 10.1.9.200: bytes=1418 time=36ms TTL=255
Request timed out.
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255
Reply from 10.1.9.200: bytes=1418 time<1ms TTL=255

  • Hi John,

    Is this on a closed network? Because I want to make sure nothing is going on with the network unrelated to the device.

    Can you readily reproduce this? Does it always follow the pattern of slower response, drop one and then back to < 1ms?

    Have you tried to isolate it to the processing going on with the other sockets? For example, remove one of them at a time to see if the behavior persists. (or does disconnecting the PC application accomplish this?)

    How long before the behavior start?

    Are there any log messages (you can see them in Tools->ROV in CCS) or console output. It could be that you are running out of buffers. There should be some console or log message to denote this.

    Here's a write-up on how the memory buffers are used and how you can tweak them. http://processors.wiki.ti.com/index.php/TI-RTOS_Networking_Stack_Memory_Usage

    You can enable more NDK logging. Please to the "Controlling Debug Messages" section in the NDK User Guide for more details on how to control this. I'd use the runtime APIs instead of the graphical method. It looks like we have a 0 vs 1 debug with the enums in the graphical method that we are currently looking at.

    Todd

  • Hey Todd,

         Here are a few quick answers:

    - The network is not closed but because it is connected to a modern switch, it should only be seeing broadcast packets that are not directly addressed to it.

    - It is very reproducible and occurs within 10 minutes of starting the device.

    - Disconnecting the PC application means the only traffic the device sees are the pings.

    - I've tried looking into the ROV but there do not seem to be any logs that I can find. I probably need to set something up in the UIA configuration logging section but I'm not sure how NDK logging is tied into the overall UIA logging. Which one of the seven UIA loggers should I use?

    - I looked at the debug capabilities of the NDK, the "Stack Debug Trace Level" is set in the NDK configuration, but again I'm not sure where this is supposed to show up or what other configurations I need to set to make them work (I do not see anything in the console window.)

    Thanks for any help,

         John C.

  • Hey Todd,

         As an update I tried the pings from a Linux box and got some slightly different results. The pings again cruse along at under 1msec and then jump to near 50msec and slowly decrease until they are back under 1msec. I'm not sure how frequently it occurs but on my current test with nothing running but the pings, it was the 466th ping when it jumped to 42msec.

    John C.

    1426 bytes from 10.1.10.43: icmp_req=724 ttl=255 time=0.496 ms
    1426 bytes from 10.1.10.43: icmp_req=725 ttl=255 time=0.573 ms
    1426 bytes from 10.1.10.43: icmp_req=726 ttl=255 time=0.559 ms
    1426 bytes from 10.1.10.43: icmp_req=727 ttl=255 time=49.0 ms
    1426 bytes from 10.1.10.43: icmp_req=728 ttl=255 time=48.2 ms
    1426 bytes from 10.1.10.43: icmp_req=729 ttl=255 time=46.6 ms
    1426 bytes from 10.1.10.43: icmp_req=730 ttl=255 time=44.9 ms
    1426 bytes from 10.1.10.43: icmp_req=731 ttl=255 time=43.0 ms
    1426 bytes from 10.1.10.43: icmp_req=732 ttl=255 time=42.0 ms
    1426 bytes from 10.1.10.43: icmp_req=733 ttl=255 time=40.8 ms
    1426 bytes from 10.1.10.43: icmp_req=734 ttl=255 time=40.0 ms
    1426 bytes from 10.1.10.43: icmp_req=735 ttl=255 time=39.3 ms
    1426 bytes from 10.1.10.43: icmp_req=736 ttl=255 time=37.8 ms
    1426 bytes from 10.1.10.43: icmp_req=737 ttl=255 time=36.6 ms
    1426 bytes from 10.1.10.43: icmp_req=738 ttl=255 time=35.0 ms
    1426 bytes from 10.1.10.43: icmp_req=739 ttl=255 time=33.7 ms
    1426 bytes from 10.1.10.43: icmp_req=740 ttl=255 time=32.1 ms
    1426 bytes from 10.1.10.43: icmp_req=741 ttl=255 time=30.1 ms
    1426 bytes from 10.1.10.43: icmp_req=742 ttl=255 time=29.7 ms
    1426 bytes from 10.1.10.43: icmp_req=744 ttl=255 time=19.7 ms
    1426 bytes from 10.1.10.43: icmp_req=745 ttl=255 time=18.3 ms
    1426 bytes from 10.1.10.43: icmp_req=746 ttl=255 time=17.0 ms
    1426 bytes from 10.1.10.43: icmp_req=747 ttl=255 time=15.4 ms
    1426 bytes from 10.1.10.43: icmp_req=748 ttl=255 time=14.0 ms
    1426 bytes from 10.1.10.43: icmp_req=749 ttl=255 time=12.7 ms
    1426 bytes from 10.1.10.43: icmp_req=750 ttl=255 time=11.2 ms
    1426 bytes from 10.1.10.43: icmp_req=751 ttl=255 time=9.13 ms
    1426 bytes from 10.1.10.43: icmp_req=752 ttl=255 time=7.92 ms
    1426 bytes from 10.1.10.43: icmp_req=753 ttl=255 time=6.83 ms
    1426 bytes from 10.1.10.43: icmp_req=754 ttl=255 time=6.00 ms
    1426 bytes from 10.1.10.43: icmp_req=755 ttl=255 time=4.25 ms
    1426 bytes from 10.1.10.43: icmp_req=756 ttl=255 time=3.03 ms
    1426 bytes from 10.1.10.43: icmp_req=757 ttl=255 time=1.34 ms
    1426 bytes from 10.1.10.43: icmp_req=758 ttl=255 time=0.581 ms
    1426 bytes from 10.1.10.43: icmp_req=759 ttl=255 time=0.543 ms

  • Hi John,

    Strange. I tried running a similar test (e.g. large ping size) on a different board and it ran fine for several hours. Did you change the Clock tick frequency? The default is to have a Clock tick every 1000 microseconds.

    Can you have the linux box ping a different target (e.g. a Window box) to rule it out (I know...it's a long shot!)?

    Regarding the debug output...it goes to the same place System_printf goes. So if you have SysMin as the System provider, it goes into an internal buffer that can be viewed in Tools->ROV->SysMin in CCS. If you have SysStd, it goes to the CCS console.

  • Hey Todd,

        Sorry to take so long to reply but I'm working on two problems at the same time. The first is this one and the second is when NDK has been running for 8-12 hours it stops working even though the rest of the system is still going (I'll have to start a thread on that.)

        When you reference the Clock Tick I assume you you are referencing the SYS/BIOS clock, which is currently set to 5000usecs. Is there a specific reason that you would need to drive sys/bios faster? One msec is getting kinda fast for the OS clock tick.

        I've put in a picture of the SysMin tab in the ROV as I'm not really sure what it is supposed to be telling me.

    Thanks,

        John C.

  • Hi John,

    Yes, I was referring to the kernel's clock. Can you change it from 5000 to 1000 to see if the issue goes away? You'll need to rebuild the application after the change. I'm looking at the code base and I think we still might have an issue with a tick rate of something besides 1000 with the NDK.

    The frequency is really dependent on your application, which is why we make it programmable:). If you do not need 1ms granularity on things like Task_sleep, Semaphore_pend, etc. 5ms is fine.

    The output for SysMin is in the "OutputBuffer" tab. Based on the fact that the outBufIndex is 0 on the snapshot you should me, I expect the buffer is empty. SysMin maintains an internal buffer that all the System_printf output goes to.

    I wish I could say I'm looking forward to the 8-12 hour failure post...

    Todd

  • Hey Todd,

        I changed the clock tick to 1000usec and the DHCP task no longer appears to run, so no IP address. If I shift it back to 5000usecs everything is fine. What am I missing here?

    John C.

    1000usec clock:

    5000usec:

  • Todd,

        I don't see any differences in which of the AM3352's DMTimer's are used and the period looks to be approx 5 times bigger on the 5000usec clock.

    Timers at 5000usec clock tick…

    Timers at 1000usec clock tick…

  • Hey Todd,

    I also tried using a static IP address configuration with the Clock tick set to 1000 and the system spits out a couple console errors and is unresponsive to pings (our serial port also hangs but other system components run.) Nothing else was changed in the system except the clock tick, then a complete rebuild.

    [CortxA8] Starting main loop1
    00000.000 Illegal priority call to llEnter()
    00000.000 Illegal call to llExit()

    So I'm guessing that I can't go with a 1000usec tick? Could all this be happening because we are using the CPSW interface into the NDK (USE_CPSW_DRIVER not used by many customers with the NDK)?

    Thanks,
    John C.
  • Is there a chance you could export a test project that has the symptoms? I'd like to try and reproduce it here.

    Todd

  • Todd,

    Because everything is running on our hardware you'd need to have the whole thing. I think at this point I'll focus on trying to get some debug information on the NDK to see if there is some clue in there. We've had to make a couple minor hardware related tweaks in the sys_bios_driver CPSW build of 1.1.0.5 SDK. In there I see "#ifdef DEBUG_STATS" and was hoping if I rebuild with that on I might be able to get a better sense of where the issue might be. Is there anything else I could do that might help? We rebuild the sys_bios_driver, usblib and platform for the SDK, then rebuild our system.

    John C.
  • Todd,

             I've turned on debug in sys_bios_driver and ran the large ping test with the following Linux and Windows results.

    Ubuntu on ARM (Utilite):

    1426 bytes from 10.1.10.122: icmp_req=856 ttl=255 time=0.598 ms
    1426 bytes from 10.1.10.122: icmp_req=857 ttl=255 time=0.653 ms
    1426 bytes from 10.1.10.122: icmp_req=858 ttl=255 time=40.9 ms
    1426 bytes from 10.1.10.122: icmp_req=859 ttl=255 time=40.0 ms
    1426 bytes from 10.1.10.122: icmp_req=860 ttl=255 time=38.9 ms
    1426 bytes from 10.1.10.122: icmp_req=861 ttl=255 time=38.2 ms
    1426 bytes from 10.1.10.122: icmp_req=862 ttl=255 time=36.3 ms
    1426 bytes from 10.1.10.122: icmp_req=863 ttl=255 time=35.1 ms
    1426 bytes from 10.1.10.122: icmp_req=864 ttl=255 time=33.5 ms
    1426 bytes from 10.1.10.122: icmp_req=865 ttl=255 time=32.0 ms
    1426 bytes from 10.1.10.122: icmp_req=867 ttl=255 time=21.7 ms
    1426 bytes from 10.1.10.122: icmp_req=868 ttl=255 time=19.1 ms
    1426 bytes from 10.1.10.122: icmp_req=869 ttl=255 time=17.9 ms
    1426 bytes from 10.1.10.122: icmp_req=870 ttl=255 time=16.7 ms
    1426 bytes from 10.1.10.122: icmp_req=871 ttl=255 time=15.3 ms
    1426 bytes from 10.1.10.122: icmp_req=873 ttl=255 time=11.5 ms
    1426 bytes from 10.1.10.122: icmp_req=874 ttl=255 time=9.50 ms
    1426 bytes from 10.1.10.122: icmp_req=876 ttl=255 time=1.65 ms
    1426 bytes from 10.1.10.122: icmp_req=877 ttl=255 time=0.538 ms
    1426 bytes from 10.1.10.122: icmp_req=878 ttl=255 time=0.554 ms


    Windows 7 cmd console results:


    Reply from 10.1.10.122: bytes=1418 time<1ms TTL=255
    Reply from 10.1.10.122: bytes=1418 time<1ms TTL=255
    Reply from 10.1.10.122: bytes=1418 time=47ms TTL=255
    Reply from 10.1.10.122: bytes=1418 time=45ms TTL=255
    Reply from 10.1.10.122: bytes=1418 time=41ms TTL=255
    Request timed out.
    Reply from 10.1.10.122: bytes=1418 time<1ms TTL=255
    Reply from 10.1.10.122: bytes=1418 time<1ms TTL=255

  • I cannot reproduce this. Can you attach a simple example (export to Export->General->archive file in CCS).

    Todd

  • Hey Todd,

    What do you run your software on for the AM335x builds? If you had a dev kit with a MityARM from Critical Link, I might be able to get you something that would work. Otherwise we could close this as the other problem I'm working on causes a exception interrupt, so at least there is something more concrete to work with.

    Thanks,
    John C.
  • I'm using a BeagleBone. If you can just get the example somewhat simple on your board, you can export and attach it. I really want to see the .cfg file, c files and compiler options. I put those into a BeagleBone example and try to reproduce the problems.

    Todd

  • Todd,

    The big problem I see with using the BeagleBone is that it has the Ethernet on Port 1. My suspicion is that the issue is related to our use of Ethernet port 2 on the 335x. I spent a good 2-3 weeks in the sys_bios_driver code and there are some anomalies in there as mentioned in this thread:
    http://e2e.ti.com/support/embedded/tirtos/f/355/t/364319
    I'd be glad to attach the config file and compiler output (to see the options) if you'd like.

    John C.