This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/AM3358: NDK ping issue

Part Number: AM3358
Other Parts Discussed in Thread: SYSBIOS

Tool/software: TI-RTOS

Hi,

  I think I got the same problem as this url described: https://e2e.ti.com/support/processors/f/791/t/701299?tisearch=e2e-sitesearch&keymatch=%20user:290018

  I have a customized am335x board, almost same to bbb.

  With my own Application, or the official examples such as 'NIMU_BasicExample_evmAM335x_armExampleproject' and 'NIMU_FtpExample_evmAM335x_armExampleproject', the ping (ping 192.168.1.4) is ok in a duration about 5 minutes, and then sudden to fail. It seems only a power-recycle can recover (Re-load .out via JTAG will not recover the network).

  My environment are:

  * pdk_am335x_1_0_14

  * ndk_3_40_01_01

  * Bios 6.75.2.00

  * EDMA3 2.12.5

  With the default setting 'Global.ndkTickPeriod = 200', ping failed after about 5 minutes.

  With the setting 'Global.ndkTickPeriod = 2000', ping last for more than 10 minutes.

  When the same board running Linux, ping is always ok. This indicates the hardware is ok.

 

  I tracked some code of nimu & emac, and I have some findings:

  * At the time the ping failed, an ARP request (60 bytes) is sent out from am335x, ARP need-resolve address is the opposite which is keep trying to ping it.

  * The last TX packet did not successfully trig a TX interrupt, thus the TX routine are hold and stuck. At the same time, RX still works.

   Any help will be greatly appreciated, thanks!

Best Regards,

GAN XJ

  • Hi,

    I can't find a BBB with JTAG header for test. Let me find another AM335x GP EVM to see if I can reproduce the issue and update here.

    Regards, Eric 

  • Hi,

    From NDK user guide: Global.ndkTickPeriod lets you adjust the NDK heartbeat rate. The default is 100 ticks. This matches the default SYS/BIOS Timer object, which drives the SYS/BIOS Clock and is configured so that 1 tick = 1 millisecond. However, you can configure a new Timer and use that to drive the Clock module. If that new Timer is not configured such that 1 tick = 1 millisecond, then you should also adjust the NDK tick period accordingly.

    From this E2E: https://e2e.ti.com/support/legacy_forums/embedded/tirtos/f/355/p/343389/1207113

    If you have below in the SYSBIOS configuration file:

    Clock.tickPeriod = 500;  // this is 500 us

    Global.ndkTickPeriod = 200; This is correct, 500x200=100000 us = 100 ms 

    I just pinged AM335x GP EVM for 40 minutes (2507 packets), I can't produce any ping failure with above setting. You can check some NDK statics in your failure case:

    Regards, Eric

  • Thanks for your quick reply!

    My hardware use a 18.432Mhz oscillator, instead the 19.2Mhz or 24Mhz of EVM and BBB boards.

    This is the only difference between our hardware and others.

    My .cfg configuration was:

    ///////////////////

    Clock.tickPeriod = 100;

    Global.ndkTickPeriod = 100;

    ///////////////////

    Then I changed it to:

    ///////////////////

    Clock.tickPeriod = 1000;

    Global.ndkTickPeriod = 100;

    ///////////////////

    After this change was made, Task_sleep(1000) seems really slept for 1 seconds.

    But the ping, always failed after few minutes.

    The ping is established from a Linux Virtual Machine, instruction is 'sudo ping -s 1000 -i 0.1 192.168.1.4', which means ping package size is 1000 Bytes, interval between 2 ping package is about 0.1s. The ping fail seems to be relative to the time, not the package size and count. With '-i 0.1', it always fail at the 1300th ping at the time about 3 minutes. Without '-i' (default 1s per packet), it always fail at the 130 ping also at the time about 3 minutes.

    At the ping fail point, a ARP is sent from NDK, then ping never response.

  • Remember that I mentioned that at the time point when ping failed an ARP request is sent out from NDK.

    I modified the source 'ndk_3_40_01_01/packages/ti/ndk/stack/route/rtable.c' by set '_RtNoTimer = 1'(it was set to 0), then rebuild ndk and my application, everything turns to fine, the ping can last for several hours.

    As if there are some timeout things happened to the route table of NDK. The network became broken when it was trying to delete and update the route.

    I don't know the details of the error. I don't know if there are some cfg entries can fix this. I don't know if this is a bug of NDK.

    But this modification do fix it.

  • Hi,

    Thanks for the update and glad you find a fix! Let me check the NDK team if this is a bug or any other suggestions.

    Regards, Eric