This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

am3874: Multicast bug in cpsw.c driver (avahi/mDNS dropout, dual_emac mode)

Other Parts Discussed in Thread: AM3874

Hi all,

I'm debugging a problem with an am3874 board that advertises its presence via mDNS (running Avahi.) I'm using a kernel and compiler derived from the SDK, with a non-stock userspace. That's a big stack, so I'll describe the underlying issue. My question (jump to the bottom!) concerns a TI kernel patch and whether or not I should backport it from the linux-am33x tree to 2.6.37.

On boot-up, my CPSW does not respond to mDNS queries. Using wireshark, I can see the requests go from my PC to the AM3874, but no response is generated. Both requests and replies should be multicast.

After several minutes, some unrelated activity from my PC causes the CPSW to add an additional ALE entry:

index 12, raw: 00000004 7002ecf4 bb33b911, type: vlan+addr(3), vlan: 2, addr: ec:f4:bb:33:b9:11, uctype: untouched(1), port: 1

After this ALE entry turns up (it seems to be either a DHCP or ARP packet that does the trick), multicasting suddenly works and mDNS queries are answered properly.

I've backported this commit from the linux-am33x tree:

https://gitorious.org/am335x/linux-am33x/commit/1a4213326177b5214ee98c8d2a8e431475a881b2

...and it also appears to fix the issue in 2.6.37. I'm wondering if this is the expected behaviour before and after this patch, and whether I can commit it to my tree and carry on.

best,

Graeme

  • Hi all,

    For anyone else who finds themselves in the same boat: there is a bug in DUAL_EMAC mode in 2.6.37's cpsw driver (and I think on newer kernels, too.) The eth0 and eth1 adapters stomp on each other's multicast lists, since cpsw_ndo_set_multicast_list() flushes EVERY multicast entry and only restores one of the adapter's filters.

    My patch (lightly tested) follows:

    4786.0001-CPSW-Fix-multicast-list-collisions-between-eth0-and-.txt
    From fd8d1a45544af41a1d8b69f97b9d6c68ad3eebf9 Mon Sep 17 00:00:00 2001
    From: Graeme Smecher <gsmecher@threespeedlogic.com>
    Date: Wed, 3 Sep 2014 12:18:07 -0700
    Subject: [PATCH] CPSW: Fix multicast list collisions between eth0 and eth1 in
     DUAL_EMAC mode.
    
    The problem is in cpsw_ndo_set_multicast_list(): this code
    
    	* deletes ALL multicast entries, and
    	* restores entries for the CURRENT slave only.
    
    There's a brief discussion on linux-netdev (with VN Mugunthan) and an e2e
    post (http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/365586/reply.aspx)
    with some more details.
    ---
     drivers/net/cpsw.c     | 19 +++++++++++--------
     drivers/net/cpsw_ale.c |  2 +-
     2 files changed, 12 insertions(+), 9 deletions(-)
    
    diff --git a/drivers/net/cpsw.c b/drivers/net/cpsw.c
    index 21eee0e..240d51a 100644
    --- a/drivers/net/cpsw.c
    +++ b/drivers/net/cpsw.c
    @@ -2384,23 +2384,26 @@ static int cpsw_ndo_do_ioctl(struct net_device *ndev, struct ifreq *ifrq,
     static void cpsw_ndo_set_multicast_list(struct net_device *ndev)
     {
     	struct cpsw_priv *priv = netdev_priv(ndev);
    +	struct cpsw_slave *slave = priv->slaves + priv->emac_port;
    +	int slave_port = cpsw_get_slave_port(priv, slave->slave_num);
    +
    +	/* Clear all mcast from ALE */
    +	cpsw_ale_flush_multicast(priv->ale, 1 << slave_port);
     
     	if (!netdev_mc_empty(ndev)) {
     		struct netdev_hw_addr *ha;
     
    -		/* Clear all mcast from ALE */
    -		cpsw_ale_flush_multicast(priv->ale,
    -				ALE_ALL_PORTS << priv->host_port);
    -
     		/* program multicast address list into ALE register */
     		netdev_for_each_mc_addr(ha, ndev) {
    +#ifdef CONFIG_TI_CPSW_DUAL_EMAC
    +			cpsw_ale_vlan_add_mcast(priv->ale, (u8 *)ha->addr,
    +				1 << slave_port | 1 << priv->host_port,
    +				slave->port_vlan, 0, 0);
    +#else
     			cpsw_ale_add_mcast(priv->ale, (u8 *)ha->addr,
     				ALE_ALL_PORTS << priv->host_port, 0, 0);
    +#endif
     		}
    -	} else {
    -		/* Clear all mcast from ALE */
    -		cpsw_ale_flush_multicast(priv->ale,
    -				ALE_ALL_PORTS << priv->host_port);
     	}
     }
     
    diff --git a/drivers/net/cpsw_ale.c b/drivers/net/cpsw_ale.c
    index c6ade83..7f78c26 100644
    --- a/drivers/net/cpsw_ale.c
    +++ b/drivers/net/cpsw_ale.c
    @@ -242,7 +242,7 @@ static void cpsw_ale_flush_mcast(struct cpsw_ale *ale, u32 *ale_entry,
     	mask &= ~port_mask;
     
     	/* free if only remaining port is host port */
    -	if (mask == BIT(ale->ale_ports))
    +	if (mask == BIT(0))
     		cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE);
     	else
     		cpsw_ale_set_port_mask(ale_entry, mask);
    -- 
    2.0.1
    
    

    best,

    Graeme

  • Hi all,

    Before I flag my question as answered -- this is just another in a series of patches to TI's 2.6.37 branch that haven't been integrated upstream. To pick a couple of examples:

    • http://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/p/252033/968627.aspx#968627
    • https://lkml.org/lkml/2013/11/18/356
    • https://lists.yoctoproject.org/pipermail/linux-yocto/2013-July/000869.html

    After spending a substantial amount of time tracking down bugs like these, it's frustrating to see them ignored by TI. Is there any path towards merging bugfixes like these into 2.6.37? If not, why not?

    Keeping bugfixes out-of-tree wastes your time and ours, since people are likely to keep running into the same problems. It also turns one of the perks of my job (public code contributions) into a process I resent.

    thanks,

    Graeme