This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

PROCESSOR-SDK-AM65X: ethernet transmit queue timeouts

Part Number: PROCESSOR-SDK-AM65X

Hello,

We're experiencing transmit queue timeouts with some PRU ethernets when testing with iperf3 using the latest available Processor SDK v07.00.01. This occurs on both our custom hardware and TI's SR2.0 IDK. On the IDK the problem usually manifests as iperf3 timing out whereas we're also seeing kernel panics more often on our custom hardware (which uses the exact same configuration as TI's IDK).

Log is attached below.

Thanks,
Matt McKee

iperf -c 192.168.3.211 -i 2 -t 100
------------------------------------------------------------
Client connecting to 192.168.3.211, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.3.210 port 46614 connected with 192.168.3.211 port 5001
[   72.378183] ------------[ cut here ]------------
[   72.382824] NETDEV WATCHDOG: eth1 (icssg-prueth): transmit queue 0 timed out
[   72.389958] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x2ec/0x2f8
[   72.398207] Modules linked in: md5 ecb aes_neon_bs aes_neon_blk des_generic libdes xhci_plat_hcd cbc xhci_hcd usbcore omap_rng rng_core dwc3 udc_core usb_common icssg_prueth crct10dif_ce pru_rproc irq_pruss_intc ti_k3_r5_remoteproc virtio_rpmsg_bus edt_ft5x06 m_can_platform m_can ti_am335x_tscadc can_dev pci_endpoint_test pruss pvrsrvkm(O) ti_cal v4l2_fwnode sa2ul phy_omap_usb2 sha512_generic dwc3_keystone authenc at24 sch_fq_codel jailhouse(O) cryptodev(O) ipv6 nf_defrag_ipv6
[   72.440438] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           O      5.4.40-g56040b97ff #1
[   72.448774] Hardware name: PHYTEC phyCORE-AM65x Carrier Board (DT)
[   72.454947] pstate: 20000005 (nzCv daif -PAN -UAO)
[   72.459732] pc : dev_watchdog+0x2ec/0x2f8
[   72.463733] lr : dev_watchdog+0x2ec/0x2f8
[   72.467732] sp : ffff80001006fdb0
[   72.471036] x29: ffff80001006fdb0 x28: ffff000071533c80
[   72.476338] x27: 0000000000000004 x26: 0000000000000140
[   72.481640] x25: 00000000ffffffff x24: 0000000000000003
[   72.486941] x23: ffff00006272541c x22: ffff000062725000
[   72.492243] x21: ffff000062725440 x20: ffff800010fd1000
[   72.497544] x19: 0000000000000000 x18: 0000000000000010
[   72.502845] x17: 0000000000000000 x16: 0000000000000000
[   72.508147] x15: ffff00007f8cbb28 x14: ffffffffffffffff
[   72.513448] x13: ffff80009006fb07 x12: ffff80001006fb0f
[   72.518748] x11: ffff800010fea000 x10: ffff8000110a6b30
[   72.524049] x9 : 0000000000000000 x8 : ffff8000110a7000
[   72.529350] x7 : ffff800010593618 x6 : 00000000000001e7
[   72.534651] x5 : 0000000000000000 x4 : 0000000000000008
[   72.539951] x3 : 0000000000000004 x2 : 0000000000000100
[   72.545252] x1 : a3f10b9d2d9fe300 x0 : 0000000000000000
[   72.550554] Call trace:
[   72.552999]  dev_watchdog+0x2ec/0x2f8
[   72.556658]  call_timer_fn.isra.0+0x20/0x78
[   72.560832]  run_timer_softirq+0x1a0/0x408
[   72.564922]  __do_softirq+0x120/0x23c
[   72.568579]  irq_exit+0xb8/0xd8
[   72.571716]  __handle_domain_irq+0x64/0xb8
[   72.575803]  gic_handle_irq+0x5c/0x148
[   72.579544]  el1_irq+0xb8/0x180
[   72.582681]  arch_cpu_idle+0x10/0x18
[   72.586251]  do_idle+0xc0/0x140
[   72.589386]  cpu_startup_entry+0x24/0x40
[   72.593305]  secondary_start_kernel+0x148/0x180
[   72.597825] ---[ end trace 04229aa2d8c8b046 ]---
[   72.602458] icssg-prueth pruss1_eth eth1: xmit timeout
[   78.276164] icssg-prueth pruss1_eth eth1: xmit timeout

  • Hi Matthew,

    Sorry for the delay.

    Does the PRU recover once this happens ? What do you see if you try to transmit a single packet after timeout ?

    Have you made any changes to driver ? Please post the patches if so.

    Regards

    Vineet

  • Hi Vineet we are also facing this similar issue with silicon Rev2 chip. But the issue is being faced only with our custom rootfs and not with the TI SDK Rootfs. We have made 1 change in the driver 

    diff --git a/drivers/net/ethernet/ti/icssg_prueth.c b/drivers/net/ethernet/ti/icssg_prueth.c
    index d4b674ac3..6ba071a60 100644
    --- a/drivers/net/ethernet/ti/icssg_prueth.c
    +++ b/drivers/net/ethernet/ti/icssg_prueth.c
    @@ -1952,7 +1952,7 @@ static int prueth_netdev_init(struct prueth *prueth,

    /* get mac address from DT and set private and netdev addr */
    mac_addr = of_get_mac_address(eth_node);
    - if (mac_addr)
    + if (!IS_ERR(mac_addr))
    ether_addr_copy(ndev->dev_addr, mac_addr);
    if (!is_valid_ether_addr(ndev->dev_addr)) {
    eth_hw_addr_random(ndev);

    Do u have any solution to this problem. We are using dual ethernet on 1 ICSSG group. total we have 4 ethernets on ICSSG1 and ICSSG2. 

  • Hi Sarfaraz,

    Are you working with Matthew (the original poster) here in some capacity ?

    Is this your ticket ?

    Regards

    Vineet

  • Hi Matthew,

    Can you post the patches for any changes that you may have made ?

    Regards

    Vineet

  • Hi Vineet,

    We have made no changes to the drivers or firmware related to ethernet.

    Thanks,
    Matt McKee

  • Matthew McKee said:
    [  3] local 192.168.3.210 port 46614 connected with 192.168.3.211 port 5001
    [   72.378183] ------------[ cut here ]------------
    [   72.382824] NETDEV WATCHDOG: eth1 (icssg-prueth): transmit queue 0 timed out

    I see this same failure signature mentioned in our internal bug-tracker in bug LCPD-19648 which I see is fixed in Linux SDK 7.01 which is due out shortly. That same bug report mentions it was seen specifically in SR2.0 silicon which seems to coincide with the reports here.  In short, I expect you will have a fix for this very shortly when SDK 7.1 is released.

  • Hi,

    Sorry for the delay, can you check on SDK 7.1 and report back ?

    Regards

    Vineet

  • SDK 7.1 has fixed the issue. Thank you.