This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/AM5726: Unstable Ethernet

Part Number: AM5726

Tool/software: TI-RTOS

Hi,

Some background information:

We copied Ethernet module from idk5726 schematics, but instead of 2 ports we use only one (1), second is disabled.

    EMAC_HwAttrs_V4 cfg;
    EMAC_socGetInitCfg(0, &cfg);
    cfg.port[0].phy_addr = GMAC_PORT1_ETHERNET_PHY_ADRESS;
    cfg.port[1].phy_addr = EMAC_CPSW_NO_PHY_ADDR;
    EMAC_socSetInitCfg(0, &cfg);

We suffered from frequent and random disconnections and unreliable LAN in general.

After investigations we discovered that frames are received (GOOD_RX_FRAMES counts) but not forwarded to host CPU.

We found a workaround - set BYPASS to "1" in ALE_CONTROL.

We use version v4 of nimu, in previous v3,  BYPASS is unconditionally set by default -  CSL_CPSW_enableAleBypass(), this line is removed from v4.

It has something to do with aging of MAC address lookup table in switch.

Is this problem is known? Is there smarter solution than ALE Bypass?

Best regards

Rasty

  • Hi,

    NIMU V3 driver is for a different device (K2E and K2L). V4 driver is for AM335x, AM437x and AM57x. When you set ALE BYPASS, all packets received by the
    MAC modules are forwarded only to the host port (port 0). In bypass mode, the ALE processes host port transmit packets (packets coming into the switch on port 0, destined for port 1 or port 2) the same as it would when in normal mode, so there is no impact on Tx direction.

    I am not sure why in your case with one port only and ALE BYPASS disabled, you have unstable link, where the incoming good frames go? Since you only have one port and all the packets should go to host (not forward to another port), it is good to set ALE BYPASS.

    Regards, Eric
  • Hi Eric,

    I did not say that we have "unstable" link. On the contrary , link is stable, but packets are not forwarded to host CPU.

    I think that this problem will also happen with 2 PHYs.

    You can relatively easy to reproduce it.

    1. You would need linux machine with 2 Ethernet ports

    2. Connect second port to IDK's LAN

    3. Set Linux eth1 and idk IP to the same subnet

    4. add static ARP entry in Linux "arp -s ip mac"

    5.tcpdump -i eth1

    6.reset idk

    7. ping idk ip

    make sure that tcpdump does print anything except ICMP (ping request/reply)

    You should see no replies.

    Best regards

    Rasty

  • Rasty,

    I didn't follow your test suggestion on how to reproduce it. I thought this is a typical ping test and always worked for me. I also used Wireshark similar to tcpdump for packet trace. The way I did: I have a Windows PC with two network interfaces (ETH0, ETH1). ETH0 is connected to my office network. ETH1 is connected to ones the AM572x IDK EVM's PHY, the EVM and ETH1 has the same IP subnet, I ran any typically NIMU test application on IDK and I have no issue to ping it.

    Regards, Eric
  • Rasty,

    I can try to disable one PHY as your code to see any difference on TI EVM.

    Regards, Eric
  • Rasty,

    I tested TI AM572x IDK EVM and I can't reproduce your issue.:

    Details:

    1. HW: TI AM572x IDK EVM

    2. SW: pdk_am57xx_1_0_13\packages\MyExampleProjects\NIMU_BasicExample_idkAM572x_armExampleproject by adding below code in main.c to disable PHY1 (0-based):

        EMAC_HwAttrs_V4 cfg;

       EMAC_socGetInitCfg(0, &cfg);

       cfg.port[0].phy_addr = GMAC_PORT1_ETHERNET_PHY_ADRESS;

       cfg.port[1].phy_addr = EMAC_CPSW_NO_PHY_ADDR;

       EMAC_socSetInitCfg(0, &cfg);

    3. Connected PHY0 to a host PC (192.168.1.11) with Ethernet cable

    4. Run the application, it uses a static IP 192.168.1.4, the UART shows only one PHY and gets 1000 Mbps full-duplex

    5. Ping from the PC there is no issue, see attached capture

    ping_test_am572.pcapng

    6. Looked at the 0X48484D08 for ALE_CONTROL, the value is 0x8000_0000, this shows ALE_BYPASS (BIT 4) is disabled.

    So even with this BYPASS  = 0 setting, there is no issue. Given your system has only one Ethernet port, there is no issue for you to set this bit to let all packets coming in direct to host.

    Regards, Eric

  • Hi Eric,
    The key condition in this test is static entry in arp:
    arp -s 192.168.1.4 7c-38-66-7d-7c-e8

    Best regards
    Rasty
  • Rasty,

    I was able to delete the ARP entry in my Windows PC, and set the interface card for IPV4 only, I still saw the ARP broadcast and ping still worked.

    I need find a way to disable ARP.

    Regards, Eric

  • Hi Eric,

    I did this test in Linux.

    Did you try "arp -s "?

    Thanks

    Rasty

  • Rasty,

    I tried arp -s to add the IP/MAC into the table. It didn't make any difference, ping still worked for me.

    Regards, Eric

  • Hi Eric,

    Please do last test with this setup.

    1. Start wireshark on that interface

    2. reset target

    3. Ping target

    I expect only ping request and reply in Wireshark output.

    If you see broadcast from Windows the setup is not equivalent to ours.

    Thanks

    Rasty

  • Rasty,

    I tried some suggestions from Internet to disable ARP packets: Locate the following registry key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters; On the Edit menu, point to New , and then click DWORD Value . add ArpRetryCount = 0. Then reboot.

    I still saw ARP from the Windows machine and the ping still worked. I will find a Linux one for test.

    Regards, Eric
  • Hi Eric,

    I really appreciate your efforts.

    I'll explain why I insist on disabling of ARP to reproduce the problem.

    It took us a long time to figure our that communication is disrupted after long idle time.

    You would not see it with simple ping, ftp or similar.

    The problem is "aging" of MAC table in switch. Upon "aging" switch forget lookup table and needs re-learning.

    As far as I can imagine the same software part in NDK that triggers aging, should also send some package to update Sitara MAC address in lookup table. This does not happen, at least we did not see anything with tcpdump.

    ARP broadcast penetrates through switch and answer causes re-learning of MAC table, which I want to eliminate in order to show the problem right away.

    Best regards

    Rasty

     

  • Rasty,

    I found a Linux machine and did the test. As the screenshot will show my identity when I log in and run from it, I will not upload the screenshot, just discuss the observation here.
    1. When the Linux machine boot up, the "arp -a" is empty (I only have one interface and configured as a static IP)
    2. I added an arp entry using "arp -s IP MAC" and used tcpdump to monitor the interface
    3. I pinged the EVM, I saw ICMP ping from PC to EVM without reply, there is no ARP broadcast in the Ethernet interface.
    4. I knew from the EVM CPSW stat that packets received
    5. I modified ALE_CONTROL from 0x8000_0000 to 0x8000_0010 to enable ALE bypass.
    6. The ping got reply after the change, so the packets received didn't deliver to host port in the past.
    =======
    Below are extra:
    7. I modified ALE_CONTROL back with ALE bypass disabled. I deleted ARP entry, then added the APR back, I can still ping it with reply.

    I will discuss with the developer and to see if enabling the ALE_BYPASS is recommended for the issue.

    Regards, Eric
  • Rasty,

    The NDK NIMU example is a ping test. That is the remote PC sends the first packet to AM57x (not the other way around, AM57x sends the first packet to PC). When there is ARP broadcasting from the PC, as you said "ARP broadcast penetrates through switch and answer causes re-learning of MAC table", this added into the AM572x's lookup table. So when the PC sends a packet, the switch knows this and delivers it into host port.

    In case of the Linux setup, there is no ARP and AM572x look up table is not updated. The switch doesn't know which port to sends the received packets from PC. If ALE bypass is enabled, it will forward all packets to host. If ALE bypass is disabled, it is supposed to send them to all the ports (including host port). For some reason, looks like forwarding to host port didn't happen.

    There may be some ways to make this work:
    1. enable ALE_BYPASS, the drawbacks is all packets go to host, even for those unwanted ones, causing software burden to discard/process them
    2. Let the NDK/NIMU software smart: add the info into learning table when run the first time or sends a packet out periodically to refresh the lookup table
    3. Find out why the port received packets didn't forward it to host

    Those are just some discussions, we don't have a plan what or when to do at this moment. Please let me know your feedback, is that ALE BYPASS enabling a solution or you can use as a workaround? Or what your expectation from our side? Then I can align with development team.

    Regards, Eric
  • Hi Eric,

    We already found ALE_BYPASS workaround.It works good.

    I think that NDK shall implement #2. If NDK tells the switch to forget MAC table, it also has to send some MAC advertisement frame or disable aging by default.

    Best regards

    Rasty