This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM3356: Performance problem when using VLAN

Part Number: AM3356


Tool/software: Linux

I'm seeing a bandwidth performance problem when using a VLAN on our AM335x based board. 

Here are the details of my setup:

  • Linux kernel 4.1.15
  • CPSW is configure for switch mode.
  • Both external switch ports are populated as gigabit ports. 
  • I'm using systemd-networkd to setup VLAN 12 as eth0.12
  • I'm using the script below to configure the CPSW to enable the VLAN. 

The problem I'm seeing is if I have both of external ports populated with active links (e.g. port 1 connected to a dumb switch, and port 2 connected to a PC or another AM335x board) then the TX bandwidth from the AM335x goes way down (like from 150Mbits to 1.5Mbits). This occurs for the eth0 and eth0.12 interface and is very repeatable. The amount of bandwidth lost seems to vary from board to board. I can eliminate the problem by either disconnecting one of the active links or removing the VLAN interface from the systemd-networkd configuration. 

I'm using iperf to measure the bandwidth and to repeat the problem only seems to affect the TX bandwidth numbers. 

I'm really puzzled and hoping that someone can point me towards a solution or a thread to pull. 

Thanks, 
Matt S. 

Switch Config Script

#!/bin/sh
set -e

echo "switch-config BEGIN"

# Client network: VLAN 1000 disabled
switch-config --del-vlan 1000 || true

# System network: VLAN 12 on port 0 (CPU port), 1 and 2
#  - Member port                          = 7: Bit 0 - Host port/Port 0, Bit 1 - Slave 0/Port 1, Bit 2 - Slave 1/Port 2
#  - Untagged Egress port mask            = 0: Bit 0 - Host port/Port 0, Bit 1 - Slave 0/Port 1, Bit 2 - Slave 1/Port 2
#  - Registered Multicast flood port mask = 7: Bit 0 - Host port/Port 0, Bit 1 - Slave 0/Port 1, Bit 2 - Slave 1/Port 2
#  - Unknown Multicast flood port mask    = 0: Bit 0 - Host port/Port 0, Bit 1 - Slave 0/Port 1, Bit 2 - Slave 1/Port 2
switch-config --add-vlan        12 --port 7 --vid-untag 0 --reg-multi 7 --unreg-multi 0

# Port 0,1 and 2 are Trunk Ports (which see 802.1Q-tagged Ethernet frames), no port VLAN ID required
switch-config --set-port-vlan    0 --port 0
switch-config --set-port-vlan    0 --port 1
switch-config --set-port-vlan    0 --port 2

switch-config -d

 

  • Hi,

    The TX traffic you mention, is this traffic originating from the ARM on the AM335x? Or data passing through the switch. Could please attach the results of switch-config -d? Could you also please describe the VLAN traffic flow that you are working on?

    The kernel version you listed in the post, this does not look to be a version that TI released, could you describe the source of the kernel? Did you start with a kernel configuration from a TI SDK? Also is there any chance you could switch to later kernel?

    Best Regards,
    Schuyler
  • Hi, 

    Yes the TX traffic I am referring to is originating from the ARM on the AM335x, data passing through the switch from other devices does not seem to be affected. 

    Here is a switch-config -d with the VLAN allowed in the switch configuration. 

    root@device:~# switch-config -d
    cpsw hw version 1.12 (0)
    0   : type: ucast, addr = f4:5e:ab:39:fa:33, ucast_type = persistant, port_num = 0x0
    1   : type: mcast, addr = ff:ff:ff:ff:ff:ff, mcast_state = f, no super, port_mask = 0x7
    2   : type: vlan , vid = 0, untag_force = 0x7, reg_mcast = 0x7, unreg_mcast = 0x7, member_list = 0x7
    3   : type: vlan , vid = 12, untag_force = 0x0, reg_mcast = 0x7, unreg_mcast = 0x0, member_list = 0x7

    It does not seem that the switch configuration switch configuration actually matters that much for this problem to occur, as I can leave it in the default configuration like so: 

    root@device:~# switch-config -d
    cpsw hw version 1.12 (0)
    0   : type: ucast, addr = 50:33:8b:15:80:72, ucast_type = persistant, port_num = 0x0
    1   : type: mcast, addr = ff:ff:ff:ff:ff:ff, mcast_state = f, no super, port_mask = 0x7
    2   : type: vlan , vid = 0, untag_force = 0x7, reg_mcast = 0x7, unreg_mcast = 0x7, member_list = 0x7
    

    And trunk interface (eth0) is equally affected simply by enabling the VLAN interface in systemd-networkd. 

    The basics of what I'm trying to achieve with the VLAN is described in this thread here

    The kernel we are using is what was being referenced by meta-ti commit id 963c35f (just over 2 years ago). It might not have been released as an SDK, it was from around the time of the Jethro release of the Yocto project. If I remember correctly TI did not make an official release for jethro. I will see if can update to a more recent and released Kernel. 

    Thanks,
    Matt S. 

  • I was able to test with linux kernel 4.1.18 built using meta-ti layer tag ti2015.03 and I'm seeing the same problem. It's going to take a fair amount of work to move to any newer kernel versions.
  • Hi,
    Apologies, I didn't notice the other thread (even though I responded to it) thanks for pointing the relationship between the two posts and the connection to bandwidth issue. I will close the other one and we will continue debug here. I will try to setup what you have posted to see if I can see the same drop in BW.

    Best Regards,
    Schuyler
  • Thank you for the response, btw I didn't mean to click the "This has resolved my issue" button on this thread and I don't know how to undo it.
  • Hello Schulyer, 

    I just wanted to confirm that this issue is still open and see if you had made any progress reproducing it. I'm somewhat concerned that it maybe difficult to reproduce as I'm seeing different behavior on different boards which doesn't make much sense since on all boards the problem goes away if I simply do not configure the VLAN interface. 

    I've also been trying to update to kernel 4.4 but haven't had much luck. 

    Thanks,
    Matt S. 

  • Hi Matt,

    I am still working on the issue that you are seeing. Can you provide the steps that you are using to set up iperf? Could you also attach an ifconfig -a output? Do you have a wireshark capture of the traffic leaving the AM3356?

    Best Regards,
    Schuyler
  • Hi Schuyler,

    For ifconfig:

    I don't have ifconfig installed in my image right now, here is the output from ip a, let me know if you need more information. 

    root@vc101_fa33:~# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel qlen 1000
    link/ether f4:5e:ab:39:fa:33 brd ff:ff:ff:ff:ff:ff
    inet 192.168.12.12/24 brd 192.168.12.255 scope global eth0
    valid_lft forever preferred_lft forever
    inet6 fe80::f65e:abff:fe39:fa33/64 scope link
    valid_lft forever preferred_lft forever
    3: eth0.12@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
    link/ether 02:8b:a3:11:33:dd brd ff:ff:ff:ff:ff:ff
    inet 169.254.2.101/16 brd 169.254.255.255 scope global eth0.12
    valid_lft forever preferred_lft forever
    inet6 fe80::8b:a3ff:fe11:33dd/64 scope link
    valid_lft forever preferred_lft forever


    For iperf:

    - Build iperf to run on the AM3356, there is a yocto/open embedded recipe for it.
    - Copy the iperf executable to the board under test.
    - On the board under test setup the trunk Ethernet device IP to 192.168.12.3/24
    - On the board under test setup the VLAN device IP to 169.254.12.3/16
    - Install iperf on a laptop `sudo apt-get install iperf`.
    - Setup the laptop to join the same VLAN as the board under test, make sure the Ethernet device on the laptop supports VLAN tagging.
    - On the laptop set the trunk Ethernet device IP to 192.168.12.2/24
    - On the laptop set the VLAN device IP to 169.254.12.2/16
    - Make sure the board under test is connected to the laptop via Ethernet.
    - Make sure the other Ethernet port on the board under test has an active link, a dumb Gbit switch seems to be enough.
    - On the laptop run iperf in server mode: iperf -s
    - On the board under test run iperf in client mode connecting to the server instance over the trunk LAN: iperf -c 192.168.12.2
    Observer the bandwidth numbers
    - On the board under test run iperf in client mode connecting to the server instance over the VLAN: iperf -c 169.254.12.2
    Observer the bandwidth numbers.
    - The bandwidth numbers for both the VLAN and trunk LAN should be similar and slower than expected if you have reproduced my problem.

    You should be able to do one of the following and repeat the iperf test and get considerably higher bandwidth numbers:
    - de-configure the VLAN interface on the board under test.
    - Disconnect the second Ethernet port on the board under test


    For wireshark:

    I am working on getting a capture, maybe later today.

  • Hi Matt,

    Could you please post the steps that you are using to make the vlan interface? I am assuming that you are using ip commands to set it up. No matter how I set it up I have several more entries in the ALE than what you have.

    Best Regards,
    Schuyler
  • Hi Schuyler,

    I am using systemd-networkd to setup the VLAN, we are using systemd version 225.

    Below are configuration files used: 

    /etc/systemd/network/eth0.network

    [Match]
    Name=eth0
    
    [Network]
    DHCP=ipv4
    LinkLocalAddressing=ipv6
    VLAN=eth0.12
    
    [Address]
    Address=192.168.12.3/24

    /etc/systemd/network/eth0-vlan.netdev

    [NetDev]
    Name=eth0.12
    Kind=vlan
    
    [VLAN]
    Id=12
    

    /etc/systemd/network/eth0.12.network

    [Match]
    Name=eth0.12
    [Network]
    LinkLocalAddressing=ipv6
    ConfigureWithoutCarrier=true
    [Address]
    Address=169.254.2.101/16

  • oops I posted the wrong eth0.12.network file here is the correct one:

    [Match]
    Name=eth0.12
    [Network]
    LinkLocalAddressing=ipv6
    [Address]
    Address=169.254.12.3/16
    

  • Hello Schuyler, 

    Attached is a zip file with some Wireshark captures with the board under test in different configurations, I've also included some text copies of the console at the time of the captures with ip and switch configurations reported.  Hopefully the file names are descriptive enough that you can understand what is what. 

    I used a managed switch with port mirroring between the board under test and the system running iperf server to capture the traffic on a separate PC, I hope I set it up correctly, I had both the board_under_test and server RX and TX traffic mirrored to the capture PC so I'm concerned that the traffic might have been double captured. 

    I haven't had a chance to really analyze the data but I suspect the interesting file will be the with-vlan12-and-2-active-links.pcapng, during that capture the bandwidth was only measured at 304Kbit/sec where as all of the others were over 300Mbit/sec.  I'll have more of a chance to analyze the data tomorrow. 

    I should also note that all Ethernet links involved in this test setup are connecting at gigabit. I feel like I've seen different results if second port on the board under test is connected to a 10/100 switch but I'm not sure.

    (hmm I have not tried forcing all connects at 10/100 speeds to see what happens, I might have to try that tomorrow).

    Regards,
    Matt S.  

    network_capture.zip

  • Hi Matt,

    Thank you for the network capture. I have a setup that looks to be working but I am not seeing the BW loss that you are. I used vconfig instead of systemd to configure the VLAN.

    - How much traffic is passing through the switch?
    - One earlier post shows the network and vlan interface with different MAC addresses. Is it possible to leave the MAC addresses the same for the ARM VLANs? Having the separate MAC address puts the port into promiscuous mode. If the traffic is high enough this might be the cause of drop in performance.

    Best Regards,
    Schuyler
  • Hi Schuyler, 

    Sorry for the delay in responding, I have not had a chance to get back to this for a couple of days. 
    To answer your questions: 

    - How much traffic is passing through the switch?
    I assume you are asking about the CPSW switch on the AM335x, for the wireshark captures I provided the only traffic is what you see in the captures no other devices connected to the switch. 

    - Is it possible to leave the MAC address the same for both the ARM VLANs? 
    I assume that you mean leave the MAC address the same for both the trunk ethernet device eth0 (aka. ARM) and the VLAN ethernet device eth0.12. 
    This would be the difference between an ipvlan (your suggestion) and a macvlan (my current implementation) as described here, am I correct? 
    I have not tried this, I will try it in the next couple of days. 

    Matt S. 

  • Hi Matt,

    We were wondering if there was other traffic that might be happening when the macvlan was used. It puts the host port into promiscuous mode and any traffic in the switch will go to the ARM which would bog it down. Does your product require using macvlans? Since you are implementing the vlan configs through systemD do you see any where in the console log or the systemD network log any messages concerning promiscuous mode enabled?

    Best Regards,
    Schuyler
  • Hi Matt,
    I am closing the thread for now, if you need to re-open it you can.
    Best Regards,
    Schuyler
  • Hi Schuyler, 

    I would like to keep this thread open, I was on holiday for the last couple of weeks and could not work on this issue.

    I am hoping to get back to testing your last suggestion in the next couple of days. 

    Matt S.