This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: eth0 disconnect after few hours

Part Number: TDA4VM

SDK vision: 8.2

My board is TDA4 connected with SJA1105(ethernet switch with RGMII) using MCU_CPSW(RGMII) .The SJA1105 is RGMII 1000M fixed link with TX and RX delay, and MCU_CPSW is also 1000M fixed link.

Sja1105 connect nvidia orin with sja1105. TDA4 and nvidia use middleware for communication. After few hours, tda4 cann't receive frames from nvidia orin, and no ping response.

Here is the connection:

But  after I execute eth0 down/up command on TDA4, the network return to normal.

Here is Tda4 register when the issue occurred

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Read at address 0x46000000 : 0x6BA00101
Read at address 0x46000004 : 0x00000000
Read at address 0x46000008 : 0x00000000
Read at address 0x4600000C : 0x00000000
Read at address 0x46000010 : 0x00000001
Read at address 0x46000018 : 0x00000000
Read at address 0x4600001C : 0x00000000
Read at address 0x46000100 : 0x4EC21102
Read at address 0x46000104 : 0x00000000
Read at address 0x46000110 : 0x00000000
Read at address 0x46000114 : 0x00000030
Read at address 0x46000118 : 0x00000000
Read at address 0x4600011C : 0x00000000
Read at address 0x46000120 : 0x00000000
Read at address 0x46000124 : 0x00000000
Read at address 0x46000130 : 0x00000000
Read at address 0x46000134 : 0x00000000
Read at address 0x46000138 : 0x00000000
Read at address 0x46000140 : 0x00000000
Read at address 0x46000144 : 0x00000000
Read at address 0x46000148 : 0x00000000
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi,

    May I know which SDK version are you using?

    Please share the debug as well if you are using Native Linux Driver on A72.

    Best Regards,
    Sudheer

  • Hi,

    The SDK version is 8.2.

  • Here is the log use ethtool :

    0363.eth0.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    NIC statistics:
    p0_rx_good_frames: 1504055
    p0_rx_broadcast_frames: 8526
    p0_rx_multicast_frames: 38846
    p0_rx_crc_errors: 0
    p0_rx_oversized_frames: 0
    p0_rx_undersized_frames: 0
    p0_ale_drop: 0
    p0_ale_overrun_drop: 0
    p0_rx_octets: 372298592
    p0_tx_good_frames: 10935015
    p0_tx_broadcast_frames: 12
    p0_tx_multicast_frames: 28159
    p0_tx_octets: 2943797403
    p0_tx_64B_frames: 11935
    p0_tx_65_to_127B_frames: 1153624
    p0_tx_128_to_255B_frames: 1712456
    p0_tx_256_to_511B_frames: 838660
    p0_tx_512_to_1023B_frames: 259035
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi,


    Please share the debug as well if you are using Native Linux Driver on A72.

    As requested above can you please share A72 Linux terminal log.

    Also, capture ethtool statistics at multiple instances (3 to 4 times with few seconds delay between each) when issue is observed.

    Best Regards,
    Sudheer

  • Hi,

    Here is the dmesg log:

    5238.dmesg.log

    The following log is captured by ethtool when issue is observed.

    tda4_eth0_1.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    NIC statistics:
    p0_rx_good_frames: 12447903
    p0_rx_broadcast_frames: 102
    p0_rx_multicast_frames: 120005
    p0_rx_crc_errors: 0
    p0_rx_oversized_frames: 0
    p0_rx_undersized_frames: 0
    p0_ale_drop: 0
    p0_ale_overrun_drop: 0
    p0_rx_octets: 65208114
    p0_tx_good_frames: 242586096
    p0_tx_broadcast_frames: 16
    p0_tx_multicast_frames: 91820
    p0_tx_octets: 2613294098
    p0_tx_64B_frames: 10601
    p0_tx_65_to_127B_frames: 11885401
    p0_tx_128_to_255B_frames: 5088070
    p0_tx_256_to_511B_frames: 6969316
    p0_tx_512_to_1023B_frames: 1529787
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    tda4_eth0_2.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    NIC statistics:
    p0_rx_good_frames: 12447916
    p0_rx_broadcast_frames: 102
    p0_rx_multicast_frames: 120018
    p0_rx_crc_errors: 0
    p0_rx_oversized_frames: 0
    p0_rx_undersized_frames: 0
    p0_ale_drop: 0
    p0_ale_overrun_drop: 0
    p0_rx_octets: 65213912
    p0_tx_good_frames: 242586096
    p0_tx_broadcast_frames: 16
    p0_tx_multicast_frames: 91820
    p0_tx_octets: 2613294098
    p0_tx_64B_frames: 10601
    p0_tx_65_to_127B_frames: 11885404
    p0_tx_128_to_255B_frames: 5088070
    p0_tx_256_to_511B_frames: 6969316
    p0_tx_512_to_1023B_frames: 1529797
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    tda4_eth0_3.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    NIC statistics:
    p0_rx_good_frames: 12447920
    p0_rx_broadcast_frames: 102
    p0_rx_multicast_frames: 120022
    p0_rx_crc_errors: 0
    p0_rx_oversized_frames: 0
    p0_rx_undersized_frames: 0
    p0_ale_drop: 0
    p0_ale_overrun_drop: 0
    p0_rx_octets: 65216152
    p0_tx_good_frames: 242586096
    p0_tx_broadcast_frames: 16
    p0_tx_multicast_frames: 91820
    p0_tx_octets: 2613294098
    p0_tx_64B_frames: 10601
    p0_tx_65_to_127B_frames: 11885404
    p0_tx_128_to_255B_frames: 5088070
    p0_tx_256_to_511B_frames: 6969316
    p0_tx_512_to_1023B_frames: 1529801
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    tda4_eth0_4.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    NIC statistics:
    p0_rx_good_frames: 12447928
    p0_rx_broadcast_frames: 102
    p0_rx_multicast_frames: 120030
    p0_rx_crc_errors: 0
    p0_rx_oversized_frames: 0
    p0_rx_undersized_frames: 0
    p0_ale_drop: 0
    p0_ale_overrun_drop: 0
    p0_rx_octets: 65218656
    p0_tx_good_frames: 242586096
    p0_tx_broadcast_frames: 16
    p0_tx_multicast_frames: 91820
    p0_tx_octets: 2613294098
    p0_tx_64B_frames: 10601
    p0_tx_65_to_127B_frames: 11885408
    p0_tx_128_to_255B_frames: 5088070
    p0_tx_256_to_511B_frames: 6969316
    p0_tx_512_to_1023B_frames: 1529805
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi,

    From Statistcs I could see both rx_good_frames and tx_good_frames are increasing  between each capture of statistics.
    Also, we could see some ALE Drops it might be the packets which are not registered MAC with ALE.

    demsg log shows "drop_cahches: 3" continuously not sure what it was please check at your end.
    Aslo, dmesg log doesn't show Arago print and root login screen? is log is complete or not? 

    Are you observing any error messages from am65-cpsw-nuss driver when ping us stopping in between?

    Best Regards,
    Sudheer

  • Hi,

    The dmesg log is completely.

    Currently dmesg log does not have any error information when this issue happens

  • Hi,

    I could not see message like below from your logs. So, I though log is incomplete.

    Currently dmesg log does not have any error information when this issue happens

    But from CPSW side we can some packets receiving and transmission.

    Can you please check ALE dump during Working scenario and Non working scenario, and check if MAC address of remote client is there are not?

    Please refer to FAQ[How to print ALE table] for dumping ALE table.

    Best Regards,
    Sudheer

  • Hi,

    Here is the ALE table when eth0 is normal using "switch-config --ndev eth0 -d" .

    tda4_eth0_ale_reg_normal.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    K3 cpsw dump version (1) len(6328)
    ALE table dump ents(64):
    0 : type: vlan , vid = 0, untag_force = 0x3, reg_mcast = 0x0, unreg_mcast = 0x0, member_list = 0x3
    1 : type: ucast, addr = 34:08:e1:59:c7:74, ucast_type = persistant, port_num = 0x0, Secure
    2 : type: mcast, vid = 0, addr = ff:ff:ff:ff:ff:ff, mcast_state = f, no super, port_mask = 0x3
    3 : type: mcast, addr = 01:00:5e:00:00:01, mcast_state = f, no super, port_mask = 0x1
    4 : type: mcast, addr = 01:00:5e:7f:00:01, mcast_state = f, no super, port_mask = 0x1
    5 : type: mcast, addr = 01:00:5e:00:00:fb, mcast_state = f, no super, port_mask = 0x1
    6 : type: mcast, addr = 33:33:00:00:00:01, mcast_state = f, no super, port_mask = 0x1
    7 : type: mcast, addr = 33:33:ff:59:c7:74, mcast_state = f, no super, port_mask = 0x1
    8 : type: mcast, addr = 01:00:5e:00:00:fc, mcast_state = f, no super, port_mask = 0x1
    9 : type: mcast, addr = 33:33:00:00:00:fb, mcast_state = f, no super, port_mask = 0x1
    10 : type: mcast, addr = 33:33:00:01:00:03, mcast_state = f, no super, port_mask = 0x1
    11 : type: mcast, addr = 01:00:5e:00:01:81, mcast_state = f, no super, port_mask = 0x1
    12 : type: mcast, addr = 01:00:5e:00:00:6b, mcast_state = f, no super, port_mask = 0x1
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    And the ALE table when this issue happens:

    tda4_eth0_ale_reg_fail.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    K3 cpsw dump version (1) len(6328)
    ALE table dump ents(64):
    0 : type: mcast, addr = 33:33:00:00:00:01, mcast_state = f, no super, port_mask = 0x1
    1 : type: ucast, addr = 34:08:e1:59:c7:74, ucast_type = persistant, port_num = 0x0, Secure
    2 : type: mcast, vid = 0, addr = ff:ff:ff:ff:ff:ff, mcast_state = f, no super, port_mask = 0x3
    3 : type: mcast, addr = 01:00:5e:00:00:01, mcast_state = f, no super, port_mask = 0x1
    4 : type: mcast, addr = 01:00:5e:00:00:fc, mcast_state = f, no super, port_mask = 0x1
    5 : type: mcast, addr = 33:33:ff:59:c7:74, mcast_state = f, no super, port_mask = 0x1
    6 : type: mcast, addr = 01:00:5e:00:00:fb, mcast_state = f, no super, port_mask = 0x1
    7 : type: mcast, addr = 33:33:00:00:00:fb, mcast_state = f, no super, port_mask = 0x1
    8 : type: mcast, addr = 33:33:00:01:00:03, mcast_state = f, no super, port_mask = 0x1
    9 : type: mcast, addr = 01:00:5e:7f:00:01, mcast_state = f, no super, port_mask = 0x1
    10 : type: mcast, addr = 01:00:5e:00:01:81, mcast_state = f, no super, port_mask = 0x1
    11 : type: mcast, addr = 01:00:5e:00:00:6b, mcast_state = f, no super, port_mask = 0x1
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    The mac address of remote client is 06:a3:88:62:e5:cc

    I read CPSW_ALE_CONTROL register(0x4603e008): 0x80000005

    when the issue happens, I set bit ALE_WLAN_AWARE to 0, ping success but ssh still fail.

    Best regards!

  • Hi,

    I could see from ALE dump VLAN 0 entry was removed form ALE.

    Can you please confirm are there any VLAN kill with VLAN ID 0 happen in your system? There is not other possibility for VLAN removal from ALE.
    It could be called by application.

    Best Regards,
    Sudheer

  • Hi,

    I can't found any VLAN kill in our system. How to trace  VLAN removal from ALE?

    Best reagrads!

  • Hi,

    You can add some prints in Driver for "cpsw_ale_del_vlan" trigger and check from where it is called.
    It can be called from "am65_cpsw_nuss_ndo_slave_kill_vid" or "am65_cpsw_port_vlans_del",  here print the VLAN ID and API name as well to track all VLAN requests.

    After making changes build Linux and replace Linux image in file system with built one.

    Best Regards,
    Sudheer

  • Hi

    I add prints in Driver and replace the LINUX image in file system, but "cpsw_al e_del_vlan" isn't called when the issue happens.

    How to add VLAN0 when the issue happens?

    Best regards! 

  • Hi,

    I add prints in Driver and replace the LINUX image in file system, but "cpsw_al e_del_vlan" isn't called when the issue happens.

    How to add VLAN0 when the issue happens?

    Can you please share the logs along with changes made?

    Can you add prints with ALE entry and index in all ale_add APIs in cpsw_ale.c and share the log with us.

    Best Regards,
    Sudheer

  • Hi,

    Here is the changes in linux kernel code:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    diff --git a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c
    index b9ab087b6..0c5d63aba 100644
    --- a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c
    +++ b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c
    @@ -231,7 +231,7 @@ static int am65_cpsw_port_vlan_del(struct am65_cpsw_port *port, u16 vid,
    */
    cpsw_ale_del_mcast(cpsw->ale, port->ndev->broadcast, port_mask,
    ALE_VLAN, vid);
    - netdev_dbg(port->ndev, "VID del: %s: vid:%u ports:%X\n",
    + netdev_err(port->ndev, "VID del: %s: vid:%u ports:%X\n",
    port->ndev->name, vid, port_mask);
    return ret;
    diff --git a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/cpsw_ale.c b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/cpsw_ale.c
    index be75bb009..695840b45 100644
    --- a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/cpsw_ale.c
    +++ b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/cpsw_ale.c
    @@ -487,6 +487,7 @@ int cpsw_ale_add_ucast(struct cpsw_ale *ale, const u8 *addr, int port,
    return -ENOMEM;
    cpsw_ale_write(ale, idx, ale_entry);
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    And the dmesg log:

    2654.dmesg.log
    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd080]
    [ 0.000000] Linux version 5.10.41-1-5-1 (root@lcz) (aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #9 SMP PREEMPT Tue May 21 11:00:12 CST 2024
    [ 0.000000] Machine model: Texas Instruments K3 J721E SoC
    [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000002800000 (options '')
    [ 0.000000] printk: bootconsole [ns16550a0] enabled
    [ 0.000000] efi: UEFI not found.
    [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000a0000000, size 1 MiB
    [ 0.000000] OF: reserved mem: initialized node vision-apps-r5f-dma-memory@a0000000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000a0100000, size 15 MiB
    [ 0.000000] OF: reserved mem: initialized node vision_apps-r5f-memory@a0100000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000a1000000, size 1 MiB
    [ 0.000000] OF: reserved mem: initialized node vision-apps-r5f-dma-memory@a1000000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000a1100000, size 15 MiB
    [ 0.000000] OF: reserved mem: initialized node vision-apps-r5f-memory@a1100000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000a2000000, size 1 MiB
    [ 0.000000] OF: reserved mem: initialized node vision-apps-r5f-dma-memory@a2000000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000a2100000, size 31 MiB
    [ 0.000000] OF: reserved mem: initialized node vision-apps-r5f-memory@a2100000, compatible id shared-dma-pool
    [ 0.000000] Reserved memory: created DMA memory pool at 0x00000000a4000000, size 1 MiB
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

  • Hi,

    Log seems like just boot time log.

    Can you add "idx" also to prints. 
    Also, add prints for addition of mcast and deletion of mcast to track all ALE additions and deletions.

    Share the log from boot to till issue is observed. also share the ALE dump at ping start and ping stop i.e. failure scenario with ALE debug prints.

    Best Regards,
    Sudheer

  • Hi,

    There is no prints when ping stop.

    ALE dump has been shared on above reply.

    Best Regards!

  • Hi,

    There is no prints when ping stop.

    I suspect some MAC addition or removal dumps.
    If you see above ALE dump Multicast MAC "33:33:00:00:00:01" added in entry "0". In working scenario it was VLAN ID 0 entry.

    Above could be the reason, I have requested to add prints at each of ALE addition and deletion and print the information of entry details like index, VLAN, MAC.

    Best Regards,
    Sudheer

  • Hi,

    I add more prints in cpsw_ale.c:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    diff --git a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-nuss.c
    index 2b52c75f7..f006dc8ed 100644
    --- a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-nuss.c
    +++ b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-nuss.c
    @@ -334,6 +334,8 @@ static void am65_cpsw_nuss_ndo_slave_set_rx_mode(struct net_device *ndev)
    port_mask, 0, 0, 0);
    }
    }
    + netdev_err(ndev, "%s\n",__func__);
    +
    }
    static void am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,
    diff --git a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c
    index b9ab087b6..0c5d63aba 100644
    --- a/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c
    +++ b/kernel/linux-5.10.41+gitAUTOINC+4c2eade9f7-g4c2eade9f7/drivers/net/ethernet/ti/am65-cpsw-switchdev.c
    @@ -231,7 +231,7 @@ static int am65_cpsw_port_vlan_del(struct am65_cpsw_port *port, u16 vid,
    */
    cpsw_ale_del_mcast(cpsw->ale, port->ndev->broadcast, port_mask,
    ALE_VLAN, vid);
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
     

    Here is the dmesg log when the issue happens:

    3666.dmesg_cpsw.log

  • Hi, we added more printing in the code, and get more update.

    as in the log at [11861.474667] shows the idx : 0 of cpsw_ale_add_mcast was called.Where the idx=0 is the vlan rule that should not be rewritten. 

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [11848.285540] am65-cpsw-nuss 46000000.ethernet eth0: am65_cpsw_nuss_ndo_slave_set_rx_mode
    [11848.296553] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_set_allmulti
    [11848.302935] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 1,port_mask 0x1
    [11848.311367] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 2,port_mask 0x1
    [11848.319808] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 3,port_mask 0x1
    [11848.328232] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 4,port_mask 0x1
    [11848.336657] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 5,port_mask 0x1
    [11848.345081] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 6,port_mask 0x1
    [11848.353508] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 7,port_mask 0x1
    [11848.361931] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 8,port_mask 0x1
    [11848.370358] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 9,port_mask 0x1
    [11848.378780] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 10,port_mask 0x1
    [11848.387293] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 11,port_mask 0x1
    [11848.395905] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_match_addr
    [11848.401994] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 3,flags:0
    [11848.410330] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 3,flags:0,port_mask 0x1,addr: 33:33:00:00:00:01
    [11848.422108] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_match_addr
    [11848.428195] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 4,flags:0
    [11848.436539] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 4,flags:0,port_mask 0x1,addr: 01:00:5e:00:00:01
    [11848.448312] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_match_addr
    [11848.454400] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 5,flags:0
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    the "#1#" printing is in the following function:

    It means the type of idx 0 was free to be written. But it is hard to tell who changed it, because we have added printing in all the functions that has cpsw_ale_write,

    but did not see them called in the log when the issue happens.

    while checking the code we found a modification in SDK 9.2 is suspicious, we don't know whether it can cause the issue:

      

    please help to check code and find the reason.

    best regards.

  • Hi,

    I have seen VLAN entry over written by MAC entry in yesterday log.
    Will be checking internally with dev team, and get back to you soon.

    while checking the code we found a modification in SDK 9.2 is suspicious, we don't know whether it can cause the issue:

    Let me check and update you on this.

    Best Regards,
    Sudheer

  • Hi,

    while checking the code we found a modification in SDK 9.2 is suspicious, we don't know whether it can cause the issue:

    Let me check and update you on this.

    No above will not cause this issue.

    Can you please dump ALE entries on every time adding Multicast entry.

    i.e. Print ale_entry[i] from cpsw_ale_read inside for loop also print the "i" as well.

    + dev_err(ale->params.dev, "%s  i: %d, value: 0x%x\n",__func__,i, ale_entry[i]);

    Best Regards,
    Sudheer

  • Hi,

    I add prints in function cpsw_ale_add_mcast:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    int cpsw_ale_add_mcast(struct cpsw_ale *ale, const u8 *addr, int port_mask,
    int flags, u16 vid, int mcast_state)
    {
    u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
    u32 ale_entry1[ALE_ENTRY_WORDS] = {0, 0, 0};
    int idx,idx1, mask;
    idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0);
    if (idx >= 0)
    cpsw_ale_read(ale, idx, ale_entry);
    cpsw_ale_set_vlan_entry_type(ale_entry, flags, vid);
    cpsw_ale_set_addr(ale_entry, addr);
    cpsw_ale_set_super(ale_entry, (flags & ALE_SUPER) ? 1 : 0);
    cpsw_ale_set_mcast_state(ale_entry, mcast_state);
    mask = cpsw_ale_get_port_mask(ale_entry,
    ale->port_mask_bits);
    port_mask |= mask;
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    And the dmesg log when the issue happens:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    [ 9486.169103] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 11,port_mask 0x1
    [ 9486.177721] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 3,flags:0
    [ 9486.186058] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 3,flags:0,port_mask 0x1,addr: 33:33:00:00:00:01
    [ 9486.197756] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 4,flags:0
    [ 9486.206094] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 4,flags:0,port_mask 0x1,addr: 01:00:5e:00:00:01
    [ 9486.217791] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 5,flags:0
    [ 9486.226127] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 5,flags:0,port_mask 0x1,addr: 01:00:5e:00:00:fc
    [ 9486.237823] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 6,flags:0
    [ 9486.246160] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 6,flags:0,port_mask 0x1,addr: 33:33:ff:59:c7:74
    [ 9486.257857] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 7,flags:0
    [ 9486.266193] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 7,flags:0,port_mask 0x1,addr: 01:00:5e:00:00:fb
    [ 9486.277890] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 8,flags:0
    [ 9486.286238] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 8,flags:0,port_mask 0x1,addr: 33:33:00:01:00:03
    [ 9486.297937] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 9,flags:0
    [ 9486.306272] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 9,flags:0,port_mask 0x1,addr: 33:33:00:00:00:fb
    [ 9486.317970] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #1# vid: 0,idx : 10,flags:0
    [ 9486.326396] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast #4# vid: 0,idx : 10,flags:0,port_mask 0x1,addr: 01:00:5e:7f:00:01
    [ 9486.338119] am65-cpsw-nuss 46000000.ethernet eth0: am65_cpsw_nuss_ndo_slave_set_rx_mode
    [ 9486.459448] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_set_allmulti
    [ 9486.465800] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 1,port_mask 0x1
    [ 9486.474232] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_flush_multicast idx: 2,port_mask 0x1
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

     The log show tha cpsw_ale_match_free resturn 0 in the first time, it is incorrect. cpsw_ale_match_free return 7 in the second time, it is right.

    And I have a question, Is the funtion  cpsw_ale_match_free reentrant?

  • Hi,

     The log show tha cpsw_ale_match_free resturn 0 in the first time, it is incorrect. cpsw_ale_match_free return 7 in the second time, it is right.

    Code added by you was wrong.

    You are using idx1 for getting free entry, but using idx in ale writing (cpsw_ale_write). 

    i.e. Print ale_entry[i] from cpsw_ale_read inside for loop also print the "i" as well.

    + dev_err(ale->params.dev, "%s  i: %d, value: 0x%x\n",__func__,i, ale_entry[i]);

    I mean add print inside cpsw_ale_read function inside for loop, it will print full ALE table every time.

    And I have a question, Is the funtion  cpsw_ale_match_free reentrant?

    No, It will enter if MAC address is not found from cpsw_ale_match_addr it will return ERROR (-ENOENT).

    Best Regards,
    Sudheer

  • Hi,

    I use idx1 for testing.

    So why the different value read twice by cpsw_ale_match_free  when the issue happens?

    Best regards!

  • Hi,

    So why the different value read twice by cpsw_ale_match_free  when the issue happens?

    My doubt is ALE read might failed in 1st iteration of cpsw_ale_match_free.
    So, only I have requested to add prints in cpsw_ale_read API to get ALE table as mentioned below.

    i.e. Print ale_entry[i] from cpsw_ale_read inside for loop also print the "i" as well.

    + dev_err(ale->params.dev, "%s  i: %d, value: 0x%x\n",__func__,i, ale_entry[i]);

    I mean add print inside cpsw_ale_read function inside for loop, it will print full ALE table every time.


    Best Regards,
    Sudheer

  • Hi,

    The ALE table is 0x00000000 0x00000000 0x00000000 when read failed in first time.

    The right value is 0x00000000 0x20000000 0x03100003 in second time.

    Fullscreen
    1
    2
    3
    [ 5050.168014] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_match_free , idx : 0, ale_entry: 0x00000000 0x00000000 0x00000000
    [ 5050.178880] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast ############ vid: 0,idx : 0, ale_entry: 0x00000000 0x20000000 0x03100003
    [ 5050.191226] am65-cpsw-nuss 46000000.ethernet: cpsw_ale_add_mcast ############ vid: 0,idx1 : 5,flags:0
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Best regards!

  • Hi,

    Can you please add delay (udelay(10);) in cpsw_ale_read after writing ALE TABLE CONTROL register i.e. adding delay for reading.


    With above change can you please check is issue is reproduced or not?

    Best Regards,
    Sudheer

  • Hi,

    Add delay (udelay(10);) in cpsw_ale_read, the issue still can be reproduced.

    Best Regards

  • Hi,

    I have reported the Issue internally with IP team to confirm the procedure to access ALE table.

    Can you check by increasing the delay, whether it was helping or not?

    If still not working after increasing the delay, can you check by writing the ALE TABLE CONTROL register twice i.e. 

    writel_relaxed(idx,ale->params.ale_regs + ALE_TABLE_CONTROL); 
    writel_relaxed(idx,ale->params.ale_regs + ALE_TABLE_CONTROL); 

    Best Regards,
    Sudheer

  • Hi,

     Writing the ALE TABLE CONTROL register twice, this issue is not reproduced.

    Best Regards

  • Writing the ALE TABLE CONTROL register twice, this issue is not reproduced.

    Hi Yongliang

        is this issue resolved by Writing the ALE TABLE CONTROL register twice?

        this case can be closed?

    Regards

       Semon

  • Hi,

     Writing the ALE TABLE CONTROL register twice, this issue is not reproduced.

    Thanks for the confirmation, please have this patch.
    Will address this issue in future release of TI SDK.

    Best Regards,
    Sudheer

  • Hi,

    Yes,this issue is resolved. Thanks for your support!

    Best Regards