This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3352 Ethernet Bonding Problem

Other Parts Discussed in Thread: AM3352

Hello,

I'm using a custom board based on the BeagleBone Black, with an AM3352 processor. The board has dual Ethernet (eth0 & eth1). I'm trying to use bonding as a fallback in case one of the lines goes down. I have both Ethernet ports connected to the same external Ethernet switch (same subnet). I'm testing the bonding as follows: While pinging my board continuously from another PC, I remove the eth0 cable to see if the connection falls back to eth1, and if the ping responses resume. I’ve done several tests with 3 different Ethernet switches and the results are inconsistent. Sometimes the ping responses will resume on eth1 after 30-40 seconds. More often, ping responses don’t resume until I reconnect eth0, and disconnect eth1. I’ve included some background below.


1. Ethernet Switches used:
--------------------------
NETGEAR FS108                 (Works sometimes)
CISCO CATALYST 2960 SERIES SI (Worked once)
NETGEAR GS108                 (Has not worked yet)


2. I’ve made the following changes to the code to enable dual emac and remove vlan tags:
--------------------------------------------------------------------------
In file: "am33xx.dtsi":
dual_emac = <1>;
dual_emac_res_vlan = <1>;
dual_emac_res_vlan = <2>;

In file: "cpsw.c"
//[PATCH] remove vlan tags in CPSW dual emac mode.
// control_reg |= CPSW_VLAN_AWARE;
control_reg &= ~CPSW_VLAN_AWARE;


3. After booting Linux, I issue the following commands to set up bonding:
----------------------------------------------------------------------
echo +bond0 > /sys/class/net/bonding_masters           Adds bond 0 interface
ifconfig eth0 down                                     Brings is down
ifconfig eth1 down                                     Brings it down
echo 1 > /sys/class/net/bond0/bonding/mode             Sets mode type
ifconfig bond0 192.168.0.17 netmask 255.255.255.0 up   Config IP address
echo 100 > /sys/class/net/bond0/bonding/miimon         Sets fallover timeout
echo +eth0 > /sys/class/net/bond0/bonding/slaves       Add eth0 to bond port
echo +eth1 > /sys/class/net/bond0/bonding/slaves       Add eth1 to bond port


4. When I pull out the eth0 cable while pinging my unit, the folling messages indicate it has switched to eth1 – BUT – ping responses don’t always resume:
--------------------------------------------------------------------------
[ 1430.239001] libphy: 4a101000.mdio:03 - Link is Down
[ 1430.302262] bonding: bond0: link status definitely down for interface eth0, disabling it
[ 1430.310806] bonding: bond0: making interface eth1 the new active one.


QUESTIONS:
1. What would cause the fallback to fail?
2. What can I do to make the fallback work consistently?

Thank you,
Everett

  • Hi Everett,

    I will forward this to the Ethernet experts. Feedback will be posted here.

  • Which version of the kernel are you using?
    Is there a reason why you disabling vlan aware mode? This is the second time I have seen people making this modification.
    Were there any kernel configuration changes that you made?
  • Hello Biser,

    Here are the answers to your questions:

    1. Which version of the kernel are you using?
    Answer:
    3.12.9

    2. Is there a reason why you disabling vlan aware mode? This is the second time I have seen people making this modification.
    Answer:
    When connecting through a "Cisco Catalyst 2960 Series SI" external switch, the VLAN wrapper caused the incoming packets to not be recognized by Linux. I'm not sure if the VLAN wrapper is appended by the Cisco switch or by the TI AM3352 internal switch (when connected through the Cisco). But disabling VLAN fixed the problem. I only saw this problem when using this Cisco switch. When I used a Netgear FS108, this problem did not occur.
    Note: This was a separate problem which caused packets to not be recognized for all network communication. The VLAN fix did not solve the bonding/fallback issue.

    3. Were there any kernel configuration changes that you made?
    Answer:
    No changes were made (that made any difference). I tried configuring the kernel (using menuconfig) to use the driver for our particular PHY (Micrel KSZ9031RNX), but this didn't make any difference.

    Thank you for your time,
    Everett
  • Hello Schuyler Patton,

    It's been almost a week since I posted the answers to your questions. Have you had a chance to consider the questions from my original post dated Feb 28, 2015 12:10 AM regarding Ethernet Bonding/Fallback?

    QUESTIONS:
    1. What would cause the fallback to fail?
    2. What can I do to make the fallback work consistently?

    Thank you for your time,

    Everett

  • Hello Everett,

    Apologize for the delay in getting back to you.

    The reason I asked was that on the 7.0 SDK kernel there was a bug on the second ethernet port not removing vlan tagging on packet egress. Your desription sounds similar, removing the vlan aware may perhaps be one solution but there is a patch to fix this problem located here if you would like to try it.

    http://processors.wiki.ti.com/index.php/Cpsw_3_12_dhcp_lease_fail_dual_emac

    Is it possible to reverse the bonding procedure so that eth1 is the primary and eth0 the secondary? I want to make that eth1 is able send or be the primary port.

    What does ethtool say for both eth0 and eth1? What I am looking for is too make sure both PHYs show connected.

  • Hello Schuyler,

    I applied the patch to the cpsw.c file and rebuilt the kernel. But there was no change. The responses at the Linux console indicate the switching is taking place (see below). But once the eth0 cable is removed, there is no response to a ping until I replace the eth0 cable and remove the eth1 cable. I've included some information below. I have not tried reversing the eth0 & eth1 ports yet. I'll try that on Monday.

    Any ideas?

    Thank you for your time,
    Everett


    Messages sent when eth0 is removed:
    -----------------------------------
    [ 421.631008] libphy: 4a101000.mdio:03 - Link is Down
    [ 421.714262] bonding: bond0: link status definitely down for interface eth0, disabling it
    [ 421.722806] bonding: bond0: making interface eth1 the new active one.


    Response to ethtool command (with both eth0 and eth1 connected):
    ----------------------------------------------------------------
    root@192:~# ethtool eth0
    Settings for eth0:
    Supported ports: [ TP AUI BNC MII FIBRE ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Half 1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Half 1000baseT/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 100Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 3
    Transceiver: external
    Auto-negotiation: on
    Supports Wake-on: d
    Wake-on: d
    Current message level: 0x00000000 (0)
    Link detected: yes

    root@192:~# ethtool eth1
    Settings for eth1:
    Supported ports: [ TP AUI BNC MII FIBRE ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Half 1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Half 1000baseT/Full
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 100Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 7
    Transceiver: external
    Auto-negotiation: on
    Supports Wake-on: d
    Wake-on: d
    Current message level: 0x00000000 (0)
    Link detected: yes

  • Hi Everett,
    Both PHYs showed linked which what I was looking for. I will try to set this up tomorrow. To make sure I set this up correctly, is there anything needed in the switch that you are connecting too that is required for the failover process to work?
    Did the reversing of the ports makes a difference?
  • Hi Shuyler,

    I've answered your questions below. I've also added some new information:

    Question 1. To make sure I set this up correctly, is there anything needed in the switch that you are connecting too that is required for the failover process to work?
    Answer:
    I didn't change anything in the switches. I'm using a Netgear FS-108, and a CISCO Catalyst 2960 Series SI. I have not configured anything in either of them.

    Question 2. Did the reversing of the ports makes a difference?
    Answer:
    I did not try reversing the ports. I'm not sure how to set that up. I'll list the commands I'm issuing at the Linux prompt to set up bonding. If you can tell me how to reverse the ports, I'll try it here.

    Linux commands to set up bonding:
    ---------------------------------
    echo +bond0 > /sys/class/net/bonding_masters Adds bond 0 interface
    ifconfig eth0 down Brings eth0 down
    ifconfig eth1 down Brings eth1 down
    echo 1 > /sys/class/net/bond0/bonding/mode Sets bonding mode type
    ifconfig bond0 192.168.0.17 netmask 255.255.255.0 up Config IP address
    echo 100 > /sys/class/net/bond0/bonding/miimon Sets the fallover timeout
    echo +eth0 > /sys/class/net/bond0/bonding/slaves Add eth0 to bond port
    echo +eth1 > /sys/class/net/bond0/bonding/slaves Add eth1 to bond port

    New Information:
    ----------------
    I'm using bonding mainly for failover - in case one of the connections is lost. I've got both eth0 and eth1 connected to the SAME switch (subnet). I've tried bonding mode 1 (Active Backup), and mode 0 (Round Robin). No mater what mode I use, I can't seem to receive data on eth1. In Active Backup mode, there is no ping response when eth0 is removed. In Round Robin mode, I'm losing half the packets with both eth0 & eth1 connected. And I get no response with eth0 disconnected. Executing "ethtool ..." or "ifconfig" seems to show that eth1 is OK. But it's not working as a failover backup.

    If I execute an "ifconfig" with both eth0 & eth1 connected, I noticed that the interrupt is NOT listed for eth1. Even though both PHY interrupts are connected to the associated lines on the AM3352. Could this be a problem? (I've included the output of the command below).

    Output for "ifconfig" command (Notice no interrupt listed for eth1):
    --------------------------------------------------------------------
    root@192:~# ifconfig
    bond0 Link encap:Ethernet HWaddr 1C:BA:8C:E0:D9:CA
    inet addr:192.168.0.17 Bcast:192.168.0.255 Mask:255.255.255.0
    inet6 addr: fe80::1eba:8cff:fee0:d9ca/64 Scope:Link
    UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
    RX packets:666 errors:0 dropped:5 overruns:0 frame:0
    TX packets:61 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:42216 (41.2 Kb) TX bytes:5494 (5.3 Kb)

    eth0 Link encap:Ethernet HWaddr 1C:BA:8C:E0:D9:CA
    inet addr:192.168.0.17 Bcast:192.168.0.255 Mask:255.255.255.0
    UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
    RX packets:664 errors:0 dropped:0 overruns:0 frame:0
    TX packets:35 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:42096 (41.1 Kb) TX bytes:3082 (3.0 Kb)
    Interrupt:56

    eth1 Link encap:Ethernet HWaddr 1C:BA:8C:E0:D9:CA
    inet addr:192.168.0.18 Bcast:192.168.0.255 Mask:255.255.255.0
    UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
    RX packets:2 errors:0 dropped:0 overruns:0 frame:0
    TX packets:26 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:120 (120.0 b) TX bytes:2412 (2.3 Kb)

    lo Link encap:Local Loopback
    inet addr:127.0.0.1 Mask:255.0.0.0
    inet6 addr: ::1/128 Scope:Host
    UP LOOPBACK RUNNING MTU:65536 Metric:1
    RX packets:0 errors:0 dropped:0 overruns:0 frame:0
    TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)


    Thank you,
    Everett

  • Everett,
    I need to apologize for the delay, we are moving offices so I will not be able to setup this experiment until after the 30th.
  • Hi Shuyler,

    Thanks for the reply.

    I've done some more testing since your last post and I have a little information.

    Test procedure summary:
    -----------------------
    1. Configure the hardware connections and boot the AM3352 unit.
    eth0 and eth1 connected with 2 cables to the same external switch (CISCO Catalyst 2969 Series SI).
    2. Enter the following commands at the Linux prompt:
    echo +bond0 > /sys/class/net/bonding_masters
    ifconfig eth0 down
    ifconfig eth1 down
    echo 1 > /sys/class/net/bond0/bonding/mode
    ifconfig bond0 192.168.0.17 netmask 255.255.255.0 up
    echo 100 > /sys/class/net/bond0/bonding/miimon
    echo +eth0 > /sys/class/net/bond0/bonding/slaves
    echo +eth1 > /sys/class/net/bond0/bonding/slaves
    3. Start a continuous ping from a Linux PC to the AM3352 board.
    4. Remove and replace eth0 & eth1 etc.
    4.1 Observe the bonding messages at the AM3352 Linux console.
    4.2 Observe the ping responses at the Linux PC doing the pinging.

    Test results:
    -------------
    PPC Board:
    With our PPC dual Ethernet boards, bonding/failover works perfectly. The correct messages are sent to the AM3352 console. Failover from eth0 to eth1 takes place within a couple seconds of removing the eth0 line, and the ping responses resume. When I re-install eth0 ping responses continue undisturbed. Then when I remove eth1, the correct messages are sent to the AM3352 console and failover back to eth0 takes place within a couple seconds, and the ping responses resume.

    AM3352 Board:
    With the AM3352 board, bonding/failover does not work properly. The correct messages are sent to the AM3352 console. But after eth0 is removed, the ping responses stop, and don't resume until about 50 seconds later. Then when the eth0 line is replaced, the ping responses stop, and don't resume at all until eth1 is removed - As if some internal switching took place when eth0 comes back up. When eth1 is removed, ping responses resume within a few seconds.

    From these symptoms, it appears the AM3352 internal Ethernet switch function might be conflicting with the bonding/failover function of the Linux OS.

    Please get back to me when you've had a chance to set up the test.

    Thank you for your time,
    Everett

  • Hi Shuyler,

    In your last reply, you said you were not going to have a chance to try my setup until after March 30th. I haven't heard anything since then. Will you have a chance to look at that soon?

    Thank you,

    Everett

  • Everett,
    Sorry for the delay, I am starting to look at this now. I going to try to use this switch for testing tp-link TL-SG2216 switch. After the failover you mentioned the pings are not resuming. Is the tx packet count increasing for the interface?
    The switches you are testing with is there is anything special that they do that your application requires?
  • Everett,

    I cannot replicate your error using our AM335x Starter Kit EVM (http://processors.wiki.ti.com/index.php/AM335x_Starter_Kit) and a TP-LINK TL-SG2216 switch (and the 3.14 Linux kernel that we provide with the AMSDK 8.0).

    Here are the steps I took:

    • Created a static Link Aggregation Group on two of the ports (port 5 and 7 in my case) of the TP-LINK switch using the configuration GUI
    • Rebuilt my kernel to include bonding support (menuconfig: Device Drivers --> Network device support --> Bonding Driver Support)
    • Booted up the Starter Kit EVM using TFTP boot mode
      • My eth0 interface uses DHCP during the boot process to get an IP and then request a file transfer of the kernel from my Ubuntu development machine
      • Since I'm booting this way I left the other Ethernet port unplugged during boot
    • Once I reached the login prompt I plugged in the other Ethernet port to the other port on the TP-LINK switch that is in the same aggregation group
      • At this point both ports on my EVM are plugged into the two aggregated ports on the switch
      • The eth0 interface is also up and has used DHCP to acquire its IP address
    • Now I followed your instructions and typed these commands (I assign the bond0 interface to the same IP address and netmask that eth0 received from DHCP during the boot process)(I also do not create bond0 using the sysfs because it is already there when my boot completes):
      • ifconfig eth0 down
      • echo 1 > /sys/class/net/bond0/bonding/mode
      • ifconfig bond0 128.247.125.229 netmask 255.255.254.0 up
      • echo 100 > /sys/class/net/bond0/bonding/miimon
      • echo +eth0 > /sys/class/net/bond0/bonding/slaves
      • echo +eth1 > /sys/class/net/bond0/bonding/slaves
    • At this point if I type 'ifconfig' I can see all three interfaces (bond0, eth0, eth1) and all three of them share the same MAC address (only bond0 and eth0 show the same IP address, eth1 doesn't show an IP address)
    • I start pinging the EVM from my Ubuntu development machine and I see the pings arriving successfully
    • Now I unplug the Ethernet cables one at a time and see the console output telling me that one interface has gone down so the other one is becoming the active interface, now an interface is back up, etc. 
    • On the pinging side I can see one or two pings get missed while the active interface is switching but after the new interface has become active (maybe 1 or 2 seconds at the most) I see the pinging resume. I can unplug and replug the Ethernet cables multiple times and see the pinging (I've attached the EVM console output as well as the Ubuntu machine console output for reference)

    Do you happen to have a Starter Kit EVM that you can test with? 

    Thanks,

    Jason Reeder

    2818.EVM_Console_Output.txt
     _____                    _____           _         _   
    |  _  |___ ___ ___ ___   |  _  |___ ___  |_|___ ___| |_ 
    |     |  _| .'| . | . |  |   __|  _| . | | | -_|  _|  _|
    |__|__|_| |__,|_  |___|  |__|  |_| |___|_| |___|___|_|  
                  |___|                    |___|            
    
    Arago Project http://arago-project.org am335x-evm /dev/ttyO0
    
    Arago 2015.02 am335x-evm /dev/ttyO0
    
    am335x-evm login: root
    root@am335x-evm:~# ifconfig
    eth0      Link encap:Ethernet  HWaddr D4:94:A1:8C:5E:B0  
              inet addr:128.247.125.229  Bcast:0.0.0.0  Mask:255.255.254.0
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:61 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:6035 (5.8 KiB)  TX bytes:684 (684.0 B)
              Interrupt:56 
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:101 errors:0 dropped:0 overruns:0 frame:0
              TX packets:101 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:186778 (182.4 KiB)  TX bytes:186778 (182.4 KiB)
    
    root@am335x-evm:~# ifconfig eth0 down
    root@am335x-evm:~# echo 1 > /sys/class/net/bond0/bonding/mode 
    root@am335x-evm:~# ifconfig bond0 128.247.125.229 netmask 255.255.254.0 up
    root@am335x-evm:~# ifconfig
    bond0     Link encap:Ethernet  HWaddr BE:D6:53:B7:18:3E  
              inet addr:128.247.125.229  Bcast:128.247.125.255  Mask:255.255.254.0
              UP BROADCAST MASTER MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:119 errors:0 dropped:0 overruns:0 frame:0
              TX packets:119 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:187714 (183.3 KiB)  TX bytes:187714 (183.3 KiB)
    
    root@am335x-evm:~# echo 100 > /sys/class/net/bond0/bonding/miimon 
    [  115.437299] bonding: bond0: Setting MII monitoring interval to 100.
    root@am335x-evm:~# echo +eth0 > /sys/class/net/bond0/bonding/slaves 
    [  130.977259] bonding: bond0: Adding slave eth0.
    [  130.983468] net eth0: initializing cpsw version 1.12 (0)
    [  131.067070] net eth0: phy found : id is : 0x4dd074
    [  131.085644] bonding: bond0: enslaving eth0 as a backup interface with a down link.
    root@am335x-evm:~# 
    root@am335x-evm:~# [  135.067316] libphy: 4a101000.mdio:00 - Link is Up - 1000/Full
    [  135.136316] bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
    [  135.145591] bonding: bond0: making interface eth0 the new active one.
    [  135.158094] bonding: bond0: first active interface up!
    
    root@am335x-evm:~# echo +eth1 > /sys/class/net/bond0/bonding/slaves 
    [  141.667311] bonding: bond0: Adding slave eth1.
    [  141.674649] net eth1: initializing cpsw version 1.12 (0)
    [  141.757095] net eth1: phy found : id is : 0x4dd074
    [  141.766936] bonding: bond0: enslaving eth1 as a backup interface with a down link.
    root@am335x-evm:~# 
    root@am335x-evm:~# [  145.757077] libphy: 4a101000.mdio:01 - Link is Up - 1000/Full
    [  145.856327] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex.
    
    root@am335x-evm:~# 
    root@am335x-evm:~# ifconfig
    bond0     Link encap:Ethernet  HWaddr D4:94:A1:8C:5E:B0  
              inet addr:128.247.125.229  Bcast:128.247.125.255  Mask:255.255.254.0
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
              RX packets:166 errors:0 dropped:19 overruns:0 frame:0
              TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:14342 (14.0 KiB)  TX bytes:684 (684.0 B)
    
    eth0      Link encap:Ethernet  HWaddr D4:94:A1:8C:5E:B0  
              inet addr:128.247.125.229  Bcast:0.0.0.0  Mask:255.255.254.0
              UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
              RX packets:147 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:12786 (12.4 KiB)  TX bytes:684 (684.0 B)
              Interrupt:56 
    
    eth1      Link encap:Ethernet  HWaddr D4:94:A1:8C:5E:B0  
              UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
              RX packets:19 errors:0 dropped:19 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:1556 (1.5 KiB)  TX bytes:0 (0.0 B)
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:119 errors:0 dropped:0 overruns:0 frame:0
              TX packets:119 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:187714 (183.3 KiB)  TX bytes:187714 (183.3 KiB)
    
    root@am335x-evm:~# [  183.067066] libphy: 4a101000.mdio:00 - Link is Down
    [  183.156414] bonding: bond0: link status definitely down for interface eth0, disabling it
    [  183.164968] bonding: bond0: making interface eth1 the new active one.
    
    root@am335x-evm:~# ifconfig
    bond0     Link encap:Ethernet  HWaddr D4:94:A1:8C:5E:B0  
              inet addr:128.247.125.229  Bcast:128.247.125.255  Mask:255.255.254.0
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
              RX packets:237 errors:0 dropped:38 overruns:0 frame:0
              TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:19098 (18.6 KiB)  TX bytes:744 (744.0 B)
    
    eth0      Link encap:Ethernet  HWaddr D4:94:A1:8C:5E:B0  
              inet addr:128.247.125.229  Bcast:0.0.0.0  Mask:255.255.254.0
              UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
              RX packets:192 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:15868 (15.4 KiB)  TX bytes:684 (684.0 B)
              Interrupt:56 
    
    eth1      Link encap:Ethernet  HWaddr D4:94:A1:8C:5E:B0  
              UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
              RX packets:45 errors:0 dropped:38 overruns:0 frame:0
              TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:3230 (3.1 KiB)  TX bytes:60 (60.0 B)
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:119 errors:0 dropped:0 overruns:0 frame:0
              TX packets:119 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:187714 (183.3 KiB)  TX bytes:187714 (183.3 KiB)
    
    root@am335x-evm:~# [  198.067070] libphy: 4a101000.mdio:00 - Link is Up - 1000/Full
    [  198.076645] bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
    
    root@am335x-evm:~# [  215.757053] libphy: 4a101000.mdio:01 - Link is Down
    [  215.776417] bonding: bond0: link status definitely down for interface eth1, disabling it
    [  215.784985] bonding: bond0: making interface eth0 the new active one.
    [  222.757067] libphy: 4a101000.mdio:01 - Link is Up - 1000/Full
    [  222.796333] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex.
    [  226.067027] libphy: 4a101000.mdio:00 - Link is Down
    [  226.096366] bonding: bond0: link status definitely down for interface eth0, disabling it
    [  226.104911] bonding: bond0: making interface eth1 the new active one.
    [  236.067041] libphy: 4a101000.mdio:00 - Link is Up - 1000/Full
    [  236.116339] bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
    [  240.757027] libphy: 4a101000.mdio:01 - Link is Down
    [  240.816382] bonding: bond0: link status definitely down for interface eth1, disabling it
    [  240.824928] bonding: bond0: making interface eth0 the new active one.
    [  248.757054] libphy: 4a101000.mdio:01 - Link is Up - 1000/Full
    [  248.836335] bonding: bond0: link status definitely up for interface eth1, 1000 Mbps full duplex.
    [  256.067064] libphy: 4a101000.mdio:00 - Link is Down
    [  256.136386] bonding: bond0: link status definitely down for interface eth0, disabling it
    [  256.144945] bonding: bond0: making interface eth1 the new active one.
    [  264.067088] libphy: 4a101000.mdio:00 - Link is Up - 1000/Full
    [  264.156326] bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
    

    0361.Ubuntu_Pinging_Output.txt
    jason@ubuntu-hpc:~$ ping 128.247.125.229
    PING 128.247.125.229 (128.247.125.229) 56(84) bytes of data.
    64 bytes from 128.247.125.229: icmp_req=1 ttl=64 time=0.608 ms
    64 bytes from 128.247.125.229: icmp_req=2 ttl=64 time=0.385 ms
    64 bytes from 128.247.125.229: icmp_req=3 ttl=64 time=0.282 ms
    64 bytes from 128.247.125.229: icmp_req=4 ttl=64 time=0.266 ms
    64 bytes from 128.247.125.229: icmp_req=5 ttl=64 time=0.249 ms
    64 bytes from 128.247.125.229: icmp_req=6 ttl=64 time=0.265 ms
    64 bytes from 128.247.125.229: icmp_req=7 ttl=64 time=0.324 ms
    64 bytes from 128.247.125.229: icmp_req=8 ttl=64 time=0.294 ms
    64 bytes from 128.247.125.229: icmp_req=9 ttl=64 time=0.257 ms
    64 bytes from 128.247.125.229: icmp_req=11 ttl=64 time=0.314 ms
    64 bytes from 128.247.125.229: icmp_req=12 ttl=64 time=0.330 ms
    64 bytes from 128.247.125.229: icmp_req=13 ttl=64 time=0.271 ms
    64 bytes from 128.247.125.229: icmp_req=14 ttl=64 time=0.253 ms
    64 bytes from 128.247.125.229: icmp_req=15 ttl=64 time=0.249 ms
    64 bytes from 128.247.125.229: icmp_req=16 ttl=64 time=0.340 ms
    64 bytes from 128.247.125.229: icmp_req=17 ttl=64 time=0.384 ms
    64 bytes from 128.247.125.229: icmp_req=18 ttl=64 time=0.412 ms
    64 bytes from 128.247.125.229: icmp_req=19 ttl=64 time=0.263 ms
    64 bytes from 128.247.125.229: icmp_req=20 ttl=64 time=0.253 ms
    64 bytes from 128.247.125.229: icmp_req=22 ttl=64 time=0.366 ms
    64 bytes from 128.247.125.229: icmp_req=23 ttl=64 time=0.304 ms
    64 bytes from 128.247.125.229: icmp_req=24 ttl=64 time=0.283 ms
    64 bytes from 128.247.125.229: icmp_req=25 ttl=64 time=0.245 ms
    64 bytes from 128.247.125.229: icmp_req=26 ttl=64 time=0.269 ms
    64 bytes from 128.247.125.229: icmp_req=27 ttl=64 time=0.382 ms
    64 bytes from 128.247.125.229: icmp_req=28 ttl=64 time=0.284 ms
    64 bytes from 128.247.125.229: icmp_req=29 ttl=64 time=0.252 ms
    64 bytes from 128.247.125.229: icmp_req=30 ttl=64 time=0.263 ms
    64 bytes from 128.247.125.229: icmp_req=31 ttl=64 time=0.325 ms
    64 bytes from 128.247.125.229: icmp_req=32 ttl=64 time=0.387 ms
    64 bytes from 128.247.125.229: icmp_req=33 ttl=64 time=0.282 ms
    64 bytes from 128.247.125.229: icmp_req=34 ttl=64 time=0.270 ms
    64 bytes from 128.247.125.229: icmp_req=36 ttl=64 time=0.375 ms
    64 bytes from 128.247.125.229: icmp_req=37 ttl=64 time=0.422 ms
    64 bytes from 128.247.125.229: icmp_req=38 ttl=64 time=0.292 ms
    64 bytes from 128.247.125.229: icmp_req=39 ttl=64 time=0.254 ms
    64 bytes from 128.247.125.229: icmp_req=40 ttl=64 time=0.261 ms
    64 bytes from 128.247.125.229: icmp_req=41 ttl=64 time=0.254 ms
    64 bytes from 128.247.125.229: icmp_req=42 ttl=64 time=0.318 ms
    64 bytes from 128.247.125.229: icmp_req=43 ttl=64 time=0.279 ms
    64 bytes from 128.247.125.229: icmp_req=44 ttl=64 time=0.328 ms
    64 bytes from 128.247.125.229: icmp_req=45 ttl=64 time=0.258 ms
    64 bytes from 128.247.125.229: icmp_req=46 ttl=64 time=0.292 ms
    64 bytes from 128.247.125.229: icmp_req=47 ttl=64 time=0.379 ms
    64 bytes from 128.247.125.229: icmp_req=48 ttl=64 time=0.290 ms
    64 bytes from 128.247.125.229: icmp_req=51 ttl=64 time=0.366 ms
    64 bytes from 128.247.125.229: icmp_req=52 ttl=64 time=0.394 ms
    64 bytes from 128.247.125.229: icmp_req=53 ttl=64 time=0.275 ms
    64 bytes from 128.247.125.229: icmp_req=54 ttl=64 time=0.260 ms
    64 bytes from 128.247.125.229: icmp_req=55 ttl=64 time=0.248 ms
    64 bytes from 128.247.125.229: icmp_req=56 ttl=64 time=0.269 ms
    64 bytes from 128.247.125.229: icmp_req=57 ttl=64 time=0.350 ms
    64 bytes from 128.247.125.229: icmp_req=58 ttl=64 time=0.279 ms
    64 bytes from 128.247.125.229: icmp_req=59 ttl=64 time=0.335 ms
    64 bytes from 128.247.125.229: icmp_req=60 ttl=64 time=0.286 ms
    64 bytes from 128.247.125.229: icmp_req=61 ttl=64 time=0.270 ms
    64 bytes from 128.247.125.229: icmp_req=62 ttl=64 time=0.372 ms
    64 bytes from 128.247.125.229: icmp_req=63 ttl=64 time=0.271 ms
    64 bytes from 128.247.125.229: icmp_req=64 ttl=64 time=0.286 ms
    64 bytes from 128.247.125.229: icmp_req=65 ttl=64 time=0.314 ms
    64 bytes from 128.247.125.229: icmp_req=66 ttl=64 time=0.297 ms
    64 bytes from 128.247.125.229: icmp_req=67 ttl=64 time=0.376 ms
    ^C
    --- 128.247.125.229 ping statistics ---
    67 packets transmitted, 62 received, 7% packet loss, time 66000ms
    rtt min/avg/max/mdev = 0.245/0.309/0.608/0.062 ms
    jason@ubuntu-hpc:~$ 

  • Hi Jason,

    I'm sorry for the delay in getting back to you. Since I couldn't get the failback working properly, I had to put the problem aside and work on something else while waiting for you to attempt to duplicate my results.

    I'm not sure why you're not seeing the problem. And I don't have an EVM Starter Kit to duplicate your setup exactly. I'll set my configuration back up and run through the steps again. I'll be sure the kernel is configured as you described. In the mean time, could you answer a couple of questions?

    QUESTIONS:

    1. I don't understand the first step in your list - "Created a static Link Aggregation Group on two of the ports (port 5 and 7 in my case) of the TP-LINK switch using the configuration GUI". Could you explain what you did there in more detail? I used a "NETGEAR FS108" and a "CISCO CATALYST 2960 SERIES SI" and I didn't configure anything in either of these switches.

    2. Is there a way you can try a different switch - Maybe one that does not require configuration?

    Thank you for you time. I'll get back to you when I've had a chance to set it up.

    Sincerely,
    Everett

  • Everett,

    As it turns out, I do not even need to create the static Link Aggregation Group on my switch in order to get the bonding on the StarterKit EVM to survive multiple plugs and unplugs during pinging. 
    I am able to ping to and from the EVM simultaneously with three different switches with no configuration on the switch (this is all using mode 1 for the bonding):
    • TP-LINK TL-SG2216 (using two ports that have not been configured)
    • D-Link DGS-1005G
    • Linksys EZXS55W

    I did run into one issue while moving between different switches during the same ping sessions. It appears that as I moved to a switch that was closer to the ping source I somehow received packets at the EVM that the EVM had previously sent. This caused the switch in the EVM to relearn its own MAC address at the wrong port which stopped any further packet reception at the device. I corrected this by disabling address learning and address updating at the two external ports in the switch of our AM335x device. You can try this method using the following commands from the console on the EVM:
    • devmem2 0x4A100D44 w 0x00000033
    • devmem2 0x4A100D48 w 0x00000033

    These two commands will set the NO_LEARN and NO_SA_UPDATE bits to 1 in the PORTCTL1 and PORTCTL2 registers while also keeping the PORT_STATE set to Forward. See section 14.5.1.10 and 14.5.1.11 in the AM335x TRM for an explanation of these registers and bits. 
    Try using the above two devmem2 commands after you have bonding configured and both ports added as slaves, but before you begin any pings.

    Thanks,

    Jason Reeder
  • Thanks Jason, I'll give it a try.

    Everett

  • Hi Jason,

    I'm including a section from your last port for reference. Then I have a question. Please see below.


    FROM YOUR LAST PORT (FOR REFERENCE):

    "...I corrected this by disabling address learning and address updating at the two external ports in the switch of our AM335x device. You can try this method using the following commands from the console on the EVM:

    • devmem2 0x4A100D44 w 0x00000033
    • devmem2 0x4A100D48 w 0x00000033

    These two commands will set the NO_LEARN and NO_SA_UPDATE bits to 1 in the PORTCTL1 and PORTCTL2 registers while also keeping the PORT_STATE set to Forward. See section 14.5.1.10 and 14.5.1.11 in the AM335x TRM for an explanation of these registers and bits.
    Try using the above two devmem2 commands after you have bonding configured and both ports added as slaves, but before you begin any pings."


    ---------------------------

    QUESTION:

    I'm trying to duplicate your test results, but I'm having a problem with a couple of the settings. I've made sure the kernel is compiled with (menuconfig: Device Drivers --> Network device support --> Bonding Driver Support) enabled. But I don't know how to configure the bits in the PORTCTL1 & PORTCTL2 registers as you suggested. The problem is that I don't have a Starter Kit EVM. So I don't have the console to enter those commands.

    Is there anything I can set in config files? Or anything I can change in the kernel build, or the dts files, etc. that will configure the bits in the control registers as you did?

    Thanks,
    Everett

  • Everett,

    Check out the attached patch. This will disable learning/updating on the external Ethernet ports. This is a development patch that I made that should only be used for testing. I'll be working with our driver developer on my side to try and get this change integrated in our future releases. 

    This patch shouldn't be absolutely necessary to get bonding to work but it does correct an issue that I saw when unplugging/plugging the EVM into three different switches repeatedly.

    Jason Reeder

    0001-No-learn-in-the-CPSW-for-dual-emac-mode.patch

  • Hi Jason,

    The patch seems to be working. I had to modify the code slightly because the patch didn't fit exactly. There were some defines missing in my code. But the failback seems to be working now. With the CISCO switch, there's a slight delay when switching (10 - 15 seconds). But it never hangs like it did before.

    Anyway, Thank you very much for your help. I realize you had to spend some time setting this up. I appreciate it.

    Sincerely,
    Everett
  • Hi Everett,

    Sorry I just came across your post as I have the same issue when talking via a CISCO router.  Our CISCO routers have their default setting to not allow vlan tag 0.  I saw the patch but I'm not sure how to apply it.... sorry it must be a basic thing (might explain why I didn't see anyone asked...) and I'm pretty new to those would you be so kind to indicate me the exact how?

    George