This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Ethernet packet loss with NDK 2.21

Other Parts Discussed in Thread: OMAP-L138, SYSBIOS, OMAPL138

Hello,

We are using the OMAP-L138 with SysBios on both ARM and DSP. In a test are we sending 8K UDP frames with a fix pattern from the ARM to a PC. As long as the CPU frequency is at 300MHz everything is fine, but at 456MHz we observe packet losses within UDP frames.

A typical Wireshark dump then looks like this:

No.     Time        Source                Destination           Protocol Length Info
 299100 129.230709  10.128.30.66          10.128.20.74          IPv4     1514   Fragmented IP protocol (proto=UDP 0x11, off=0, ID=be9f)
 299101 129.230830  10.128.30.66          10.128.20.74          IPv4     1514   Fragmented IP protocol (proto=UDP 0x11, off=1480, ID=be9f)
 299102 129.230956  10.128.30.66          10.128.20.74          IPv4     1514   Fragmented IP protocol (proto=UDP 0x11, off=2960, ID=be9f)
------------------------------------------------------------------------ missing packet -------------------------------------------------------------------
 299103 129.231204  10.128.30.66          10.128.20.74          IPv4     1514   Fragmented IP protocol (proto=UDP 0x11, off=5920, ID=be9f)
 299104 129.231264  10.128.30.66          10.128.20.74          IPv4     622    Fragmented IP protocol (proto=UDP 0x11, off=7400, ID=be9f)

Versions used:

-          SysBios:               6.33.04.39

-          NDK:                    2.21.00.32

-          NSP:                      1.10.00.03

Any help or hints what could go wrong and how to verify if there is no hardware restriction is appreciated.

Best regards

Stéphane

  • Stephane,

    When you are running at the different CPU speeds… do you run the program at the same CPU speed from startup?  Or, are you dynamically scaling the CPU frequency at runtime?

    If you are dynamically scaling at runtime, I wonder if the related peripherals need to be reconfigured for the higher frequency after the scaling.  In other words, the scaling operation changes peripheral clock rates too, and the periperal settings need to be adjusted after the scaling.  (This is just a guess.)

    If you are not dynamically scaling the CPU frequency, then this seems backwards, because it seems that a slower CPU would have trouble keeping up under load versus a faster CPU.

    Can you please describe your application some more?

    Thanks,
    Scott

  • Hi Scott,

     

    We do a kind of dynamical frequency scalling:

    • In a first step we start our bootloader at 300MHz which:
    • Loads the main application with TFTP
    • Changes the input voltage from 1.2V to 1.3 V (needs I2C)
    • Changes the CPU Frequency to 456MHz
    • Jumps to the _c_int00 vector from the main application

    The main application then starts fix at 456MHz.

    MII connection to the PHY --> SysClk4. So we adapted the NSP such as the VBUSCLK adjusts to the present SysClk (MDIO.CONTROL.CLKDIV = 114 in case of 456MHz).
    We also compiled the NDK with #define MMALLOC_MAXSIZE 8192 (instead of 3068 in pbm.c) and #define RAW_PAGE_SIZE 8192 (in mem_data.c)

    To configure NDK from our application we call:

    uint tmp = 8192; // replace with size you want to set MTU for
    CfgAddEntry(hCfg, CFGTAG_IP, CFGITEM_IP_IPREASMMAXSIZE, CFG_ADDMODE_UNIQUE, sizeof(uint), (UINT8*) &tmp, 0);
    CfgAddEntry(hCfg, CFGTAG_IP, CFGITEM_IP_SOCKUDPRXLIMIT, CFG_ADDMODE_UNIQUE, sizeof(uint), (UINT8*) &tmp, 0);

    setsockopt(ucom->socket, SOL_SOCKET, SO_RCVBUF, &tmp, sizeof(int) );
    setsockopt(ucom->socket, SOL_SOCKET, SO_SNDBUF, &tmp, sizeof(int) );

    We have a Communication Client Task, which – in our tests scenario - sends a 8K UDP frame (sendTo(…)) and then waits for the reply (recvFrom(…)) in a for loop.

    Thanks

    Stéphane

  • Stephane,

    OK, thanks for the details. It is not the type of scaling that I was thinking of that would require dynamic adjustments.  

    I will have to ask some experts for their help with this one…

    Scott

  • Hi Scott,

    To verify that there isn’t a hardware restriction, I loaded a Linux (Angstrom v2012.03-core - Kernel 3.2.0 provided by CriticalLink) on the OMAP-L138 and wrote a similar test routine (sending 8K UDP frames). Result: I couldn’t detect a single packet loss!

      

    Conclusion:

    The hardware is completely innocent.

    Best regards

    Stéphane

  • Hi Stephane,

    Thanks for the additional information.

    Can you please verify that the SYS/BIOS Clock rate is configured the same in both the 300MHz and 456MHz cases?  The default Clock.tickPeriod for an application is 1000usec, and the NDK creates a Clock object whose function gets called every 100th tick.  This is the NDK “heartbeat” and it must beat every 100msec.  Our thinking is that maybe this has changed between your two cases.

    If this is not the issue, can you please post the contents of the application configuration file (with the .cfg extension)?

    Thanks,
    Scott

  • Hi Scott,

    I think our clock tick from SysBios has the correct period. On the one hand we are using the BIOS_setCpuFreq() function to adjust the bios to the given frequency and on the other hand GPIOs, driven within Clock_create, always toggle at the same speed.

    Please find attached our configuration file (ArmApp.cfg).

    Stéphane

    ArmApp.cfg
  • Hi Stephane,

    I was able to reproduce some UDP packet loss using the NDK's 'testudp' application.  I'm looking into the cause.  With the testudp app I didn't see any noticable difference in loss rate when setting the CPU to 496MHz.

    Another thing you can check are the global statistics variables that are in the NDK.  You can enter the following variable names into the 'Expressions' window of CCS:

    • udps
    • ips
    • raws
    • tcps (just mentioning TCP stats name for completeness ...)

    Lastly, it may help me to try your test case ... would you mind sharing your NDK application code as well as the corresponding Windows side executable?

    Steve

  • Hi Steve,

    Thank you for your input. I will check the NDK statistics and post them if I see something noticeable.

    It’ is not a problem of the amount of the lost packets, but of running into a timeout (500ms), because we don’t receive a UDP acknowledge from the server (PC), when we send a corrupted UDP frame.

    Concerning the project sharing: Is there a way to send you the project directly? (The zipped project takes about 60MB).

    Stéphane

  • Hi Steve,

    I made some snapshots of the NDK counters.

    Therefore, I ran a UDP NetTest with 10000 UDP frames à 8000K. My log says:

    00:00:24:       do: TestNet started
    00:00:26:  network: UComClient err_rx_timeout <--packet loss! We don’t get an acknowledge from the server
    00:00:27:  network: UComClient err_rx_timeout
    00:00:30:  network: UComClient err_rx_timeout
    00:00:31:  network: UComClient err_rx_timeout
    00:00:33:  network: UComClient err_rx_timeout
    00:00:34:  network: UComClient err_rx_timeout
    00:00:35:  network: UComClient err_rx_timeout
    00:00:36:  network: UComClient err_rx_timeout
    00:00:38:  network: UComClient err_rx_timeout
    00:00:39:  network: UComClient err_rx_timeout
    00:00:43:  network: UComClient err_rx_timeout
    00:00:44:  network: UComClient err_rx_timeout
    00:00:45:  network: UComClient err_rx_timeout
    00:00:46:  network: UComClient err_rx_timeout
    00:00:47:       do: tlg size =  8000 =>   421 tlg/s  3369 kByte/s
    00:00:47:       do: TestNet done

    Stéphane

    NDK statistics.xlsx
  • Stéphane Peter said:
    Concerning the project sharing: Is there a way to send you the project directly? (The zipped project takes about 60MB).

    Yes, you can email it to me on this forum.  In order to do that, we must be "friends" on the forum.  Once we are friends, then we can send private emails.  So your project will not be seen by others on the forum.

    Stéphane Peter said:
    It’ is not a problem of the amount of the lost packets, but of running into a timeout (500ms), because we don’t receive a UDP acknowledge from the server (PC), when we send a corrupted UDP frame.

    I'd like to make sure I understand this better.  You are sending a corrupted UDP frame from the NDK to the PC, and the NDK application is hitting a timeout because the PC is not responding to the corrupted UDP frame?

    Steve

  • Hi Steve,

    You are partially correct. Our client task sends a UDP frame with the NDK function sendto( … ) and waits for an answer using recvfrom( … ).

    Normally, we have a sequence like this:

    But in some cases we observe this scenario, where one UDP sub-packet is missing on the PC side:

    Stéphane

  • Hi Steve,

    In your answer from Feb 15 2013 20:48, you mentioned that you didn’t see any noticeable difference in loss rate. In the case of a simple UDP test, where data are sent regardless of completeness and overlying protocol, this completely makes sense, because loosing 1 or 2 packets per second is statistically not relevant. But in our case, we run in a timeout for each packet loss, what decreases the throughput seriously. In my opinion, the decreasing throughput in our test only highlights the problem caused by the NDK.

    But you also mentioned that you could reproduce some UDP packet loss using the NDK’s ‘testup’ application. You already have findings in that cause? I guess, solving this issue will also be the solution for our problem.

    Stéphane

  • Hi Stephane,

    I'm still trying to pinpoint the location in the stack that's dropping the UDP packet.  I'll let you know as soon as I have more info.

    Steve

  • Hi Steve,

    Your last status update is now more than a week old. Did you meanwhile find anything remarkable?
    Do you have at least something like a time schedule for this issue?

    Best regards Stéphane

  • Hi Stephane,

    Unfortunately I don't have anything remarkable to report yet.  I have studied your code you sent, though.

    When you see the packet dropped, are you seeing an error on the return value for sendto?

    If so can you check the status of fdError() at this point?  I've modified your code to do that:

       int n = sendto( ucom->socket, (char*)data, dataLen, 0,
                        (struct sockaddr *) &ucom->client, sizeof(sockaddr_in));
        ok = (n == dataLen);
        if (!ok) {
            UComErrorSet(ucom, UCOM_ERR_TX_FAILED);

            DbgPrintf(DBG_INFO, "sendto failed: %d:\n", fdError());


            return FALSE;
        }

    Similarly, could you check the recvfrom call?

    Steve

  • Hi Steve,

    I checked the both calls:

    • sendto() never fails. The return value always corresponds to the amount of data to send
    • recvfrom() fails with error code 35

    Stéphane

  • Hi Steve,

    I played a little bit with different clock configurations.
    I started with following clock configuration which works without errors:

    ARM9_0: GEL Output:
    ARM9_0: GEL Output: ---------------------------------------------
    ARM9_0: GEL Output: |              Clock Information             |
    ARM9_0: GEL Output: ---------------------------------------------
    ARM9_0: GEL Output:
    ARM9_0: GEL Output: PLLs configured to utilize crystal.
    ARM9_0: GEL Output: ASYNC3 = PLL0_SYSCLK2
    ARM9_0: GEL Output:
    ARM9_0: GEL Output: NOTE:  All clock frequencies in following PLL sections are based
    ARM9_0: GEL Output: off OSCIN = 24 MHz.  If that value does not match your hardware
    ARM9_0: GEL Output: you should change the #define in the top of the gel file, save it,
    ARM9_0: GEL Output: and then reload.
    ARM9_0: GEL Output:
    ARM9_0: GEL Output: ---------------------------------------------
    ARM9_0: GEL Output: |              PLL0 Information             |
    ARM9_0: GEL Output: ---------------------------------------------
    ARM9_0: GEL Output:
    ARM9_0: GEL Output: PLL0_SYSCLK1 = 300 MHz
    ARM9_0: GEL Output: PLL0_SYSCLK2 = 150 MHz
    ARM9_0: GEL Output: PLL0_SYSCLK3 = 100 MHz
    ARM9_0: GEL Output: PLL0_SYSCLK4 = 75 MHz
    ARM9_0: GEL Output: PLL0_SYSCLK5 = 100 MHz
    ARM9_0: GEL Output: PLL0_SYSCLK6 = 300 MHz
    ARM9_0: GEL Output: PLL0_SYSCLK7 = 50 MHz
    ARM9_0: GEL Output:
    ARM9_0: GEL Output: ---------------------------------------------
    ARM9_0: GEL Output: |              PLL1 Information             |
    ARM9_0: GEL Output: ---------------------------------------------
    ARM9_0: GEL Output:
    ARM9_0: GEL Output: PLL1_SYSCLK1 = 300 MHz
    ARM9_0: GEL Output: PLL1_SYSCLK2 = 150 MHz
    ARM9_0: GEL Output: PLL1_SYSCLK3 = 100 MHz

    Now I changed PLL1_SYSCLK by changing the pllm register via JTAG and suddenly I got packet losses again:

    PLL1_SYSCLK1 = 240MHz: 2 loss per 50000 packets
    PLL1_SYSCLK1 = 228MHz: 1 loss per 50000 packets
    PLL1_SYSCLK1 = 216MHz: x00 loss per 50000 packets

    First I thought that the clock change without re-initializing the DDR could be the reason, but in this case other DDR access should fail as well. In this case I would expect that the application doesn’t run stable anymore. Is this conclusion correct?

    What is with your “testudp” app? Does your observation match with our results?

    Stéphane

  • Hi Stephane,

    Sorry I don't have much for you right now, as I'm out of the office until Tuesday.  I can help more when I return.

    But I see that you're getting the same error code that I saw on my simple test program - 35.  Looking at ti/ndk/inc/serrno.h you can see that this corresponds to EWOULDBLOCK.

    It means that the receive call took a while to receive data and would have resulted in a block, however since the call was made on a non-blocking socket, it can't so it returns EWOULDBLOCK.

    What happens if you try making the socket blocking?

    Steve

  • Hi Steve,

    Unfortunately, I was not able to connect my module client with the host server with a blocking client socket. But I guess that the call will block until he receives the icmp packet from the host (I keep you updated).

    Today I disabled the MMU: And I didn’t lose a single packet – but the bandwidth dropped by almost factor 2 and the CPU load was over 90% by only sending dummy test packets.

    Stéphane

  • Hi Stephane,

    Just an update that I'm still debugging here.  I also tried the project that you emailed me.  I was able to build it, however I could not load it due to an incompatible memory map in the custom platform file you have set up in the project.  Did you ever try this on the TI EVM?  If so, and you have the project for that, it would be helpful to send me that also.

    Steve

  • Hi Steve,

    Sorry! I can't help, because we never used the TI EVM. We directly started with the MityDSP-L138F (CCL provides an own Development Kit) --> no other platform file

    Stéphane

  • Hi Stephane,

    I've made progress today but still haven't been able to pinpoint why the UDP packet is being dropped.  Also, everything I'm working on is with the standard NDK client application + the PC side app 'testudp'.

    Here's where I'm at:

    1. I see the client side app 'testudp' send out a UDP packet.  It then waits for a response from the NDK.  Usually this works but eventually it hits the case where a packet is sent but the NDK never sends a response.
    2. I've added tracer code all through the stack in order to follow the path of this packet in the NDK.
    3. In the failure case, I see that the packet is received by the NDK in the application layer (recvncfrom())
    4. The NDK app then sends the same packet back to the client side app (sendto())
    5. I am able to trace the packet all the way down the stack and into the driver
    6. I see that in the driver the packet is queued up for the hardware to pick up (emacEnqueueTx)
    7. I see the transmit interrupt occurs (EMAC_TxServiceCheck called)
    8. I see the packet is freed "after it has been sent" (emacDequeueTx)

    This is as far as I've been able to get.  It looks like the packet makes it all the way, but I never see it go out on the wire.  I think I will need to get some help from someone who better knows the driver code to see how to find out if everything is behaving correctly here).

    Steve

  • Hi Steve,

    Thanks for your effort and the update.

    Stéphane

  • Hello Steve,

         I have the same packet loss problem as Stephan and our application is mostly similar to his. Our software on the  EVM  just broadcasts UDP packets and takes no feedback from the receiver. We send out a big buffer split into 1460 udp payload packets and try to collect this buffer at the PC side and verify the CRC. We see drops happening randomly. But sendto() never complains.

    What might be interesting to you is that the software boots from SPI flash and starts broadcasting data on power up. Here packet loss happens. 

    But when we load the .out (used for creating the SPI flash bootable binary) using code composer and run, it runs forever without any packet loss.

    We are looking into the differences of the H/W configurations used in the CCS gel files and the AIS gen H/W configuration used for the bootable Binary.

    Maybe this info helps you.


    -Sara 


  • Hi Sara,

    Can you please create a new thread for the issue you are seeing?  Please be sure to mention the version of NDK and BIOS you are using in your application.

    Thanks,

    Steve

  • Hi Stephane,

    Here's my latest update.... But first, I'm not sure I asked you before, but could you please explain your network topology? Are you using a router, switches, etc. in between the PC and ARM communications?

    Also, are you using static IP addresses?  (I think the answer to that is yes based on the application you shared with me).

    So in order to better debug the issue I'm able to see on my side (client example on OMAPL138 communicating with Linux laptop running NDK testudp app on private 192.x.x.x router network), I've changed the topology so that the devices are connected through a Dlink switch.  The switch supports port mirroring which allows me to capture all of the packets that go out onto the network from another, independent machine (Windows 7 machine).

    What I see in this case is the Wireshark on my Linux machine (which is also running the testudp app) yields the same result - it still doesn't show the reply from the NDK when the failure happens.

    However, when viewing the caputre on the Windows 7 machine, which shows all packets, I indeed see that the NDK has transferred the packet out onto the wire.

    So, it seems that in this case, the packet reaches the wire but the Linux machine doesn't see it.  I believe this means that the UDP packet is being dropped in the Linux stack, or maybe even at the hardware level.  I can't say for sure, but it looks to me that in this failure, the NDK is doing its job - the packet is going out on the wire.

    Are you able to make a similar test on your topology?  Do you have a switch with port mirroring that will allow you to capture all data on the wire from all hosts in your network topology?

    Lastly, I wanted to bring up a question that's been on my mind while debugging all of this ... the dropped packet we are investigating is a UDP packet, and by definition is unreliable ... do you think we may be putting too much faith in a protocol that is unreliable by design?

    Anyway, it's still very possible that your failure case is different than what I'm actually seeing in this scenario I've been investigating here.  I'm curious to know what the capture would show in your case.

    Steve

  • Hi Steve,

    Usually, we are using a 100MB Ethernet switch between module and PC. We also tried to connect the module directly with the PC - always with the same result, that we lose some packets.

    Unfortunately, I was not able to organize a switch with port mirroring today. But, I tried different PCs (win7, XP), but none of them worked without packet loss.

    Due to your description that the packet is on the wire but the receive side sometimes doesn’t take the data, I controlled the MII_TXEN for IPG violations. But I couldn’t find an IPG smaller 963ns on the MII_TXEN. Do you have an explanation for that behavior?
    Can you send me the both wireshark traces?

    Best regards

    Stéphane

  • Hi Stephane,

    I've attached the Wireshark captures.  Here's the details on the test scenario:

    Linux laptop: 192.168.1.2

    evmOMAPL138: 192.168.1.3

    The laptop and target board are hooked up to a Dlink switch that allows port mirroring.  A Netgear router is also connected to the switch to allow the hosts to get IP addresses via DHCP.  There's also a 3rd host - a Windows 7 PC.  This PC is connected to the switch and the switch is configured to map all network traffic to the Ethernet port that the Win7 machine is connected to.  So, the Wireshark captures on this machine should show all data "on the wire."

    I run the following command on the Linux laptop to the NDK.  The command runs for a while, then I see it fails on packet size 88:

    testudp 192.168.1.3

    ...
    testsize = 70
    testsize = 71
    testsize = 72
    testsize = 73
    testsize = 74
    testsize = 75
    testsize = 76
    testsize = 77
    testsize = 78
    testsize = 79
    testsize = 80
    testsize = 81
    testsize = 82
    testsize = 83
    testsize = 84
    testsize = 85
    testsize = 86
    testsize = 87
    testsize = 88
    ./testudp: failed on size 88, select returned: 0
    numSent = 7424
    ULA0323418:~/temp/winapps>

    Looking at the Wireshark capture taken on the Linux laptop, I can also see that the program sent out the packet with size 88 (I modified the testudp client to write the size of the UDP payload into the beginning of the data) but I never see the response.  Note that in the payload you can see the beginning of the echo data has "0x58" == 88 decimal.  So this is the packet corresponding to the output above:

    Now, here's the capture from the Win7 box.  Notice that the response from the NDK is seen on the wire:

    Steve

    7737.packet-on-wire.zip

  • Hello Steve,

    I think that we are not seeing the same problem:

    • Our problem only occurs with bigger UDP frame sizes (typically > 4K), where more than one Ethernet packet has to be used for one frame.

    • The problem disappears with changes on the OMAP (disable MMU for example).

    • Different PCs shows the same behavior

    We have the impression that the issue is somehow cache/memAccess related. Two observation lead to this conclusion:

    1. disabling caching in the project configuration file makes the communication stable.

    2. We are using a FIFO to queue messages which have to be sent. When the amount of test UDP messages completely fits into this FIFO, no packets are dropped. But as soon as we have to wait (sleep) because the queue is almost full, the disaster starts.
      UcomTestNet with:
      tlgTotal=400 works, while 420 goes to sleep and some packets are dropped

    Can you manage to bring our project up and running on your hardware? In other case, what kind of EVM board are you using, such as we can try to adapt our code that target.

    Best regards

    Stéphane

  • Hi Stephane,

    I am using the TI evmOMAPL138 board (LCDK) here.  I have tried to build your app but could not get it loaded properly.

    Do you have a TI evm?  If so, can you try on that?  If you can and send me it again, I will be more easily able to try to reproduce your issue on my side.

    Steve

  • Hi Steve,

    Thanks for trying. I will order one of the evm boards and send you a new project on that one.

    Stéphane

  • Hi Steven Connell !

    I was able to fix the problem.
    There is a document SPRU523H chapter 3.5.1.

    UDP application drops packets on send() calls.

    Yes! We had scheduler problem.
    Our network tasks had a higher priority than the network stack.

    I changed that.

    Now the NetworkStack runs at higer priority (NC_PRIORITY_HIGH=8) and
    our network communication tasks at lower priority (NC_PRIORITY_LOW=2).
    The problem disappeared.
    NSP and NDK work without any modification.

    The problem is solved for me.
    Don't invest more time.
    I will inform Stephane Peters as soon as he is back from holiday.
    Thanks for your work !

    Stefan Gehrig

  • Hi Steve and Stefan,

    If it could be that simple!

    Unfortunately, Stefan made his tests with only 400 UDP frames, what isn’t enough to provoke the failure, because it fits directly into the transmit FIFO (see post from Mar 27th).

    Stéphane

  • Hi Steve,

    I think we have found the reason why the EMAC doesn’t send some Ethernet packets. We guess that a FIFO underrun occurs, because the DDR is blocked by higher prioritized DMAs (cache). Is there a register indicating such an underrun? When giving the EMAC DMA a priority of at least the same or higher priority than instruction and data cache DMA, the Ethernet transmission works without failure.

    Can you explain why TI has chosen a low default priority for the EMAC DMA, when the hardware FIFO requires such a tight latency (5.12 us according chapter 19.2.12)? Is there a reason for that? Do we run into other problems, when we improve the EMAC DMA priority? What do we have to take into account, when playing with these priority levels?

    Stéphane

  • Hi Stephane,

    Great news that you have solved the problem!  Sorry, but I'm not sure I can be of much help with the questions you have ... I'm not familiar with the DMA. I'm also not sure if there's a register that will indicate underrun (a quick search of sprs586d.pdf didn't yield anything).

    I would guess that when choosing priority settings of DMA, you should consider what is more important to your application. If Ethernet data is the most important, then it should be given a higher priority.

    Steve