This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] TDA4VL: How to resolve NETDEV WATCHDOG: transmit queue timed out issue in Linux (A72) in case of SBL optimized boot flow?

Part Number: TDA4VL

How to resolve NETDEV WATCHDOG: transmit queue timed out issue in Linux (A72) in case of SBL optimized boot flow?

  • The transmit queue timed out, and the watchdog log is because the configuration of the CPSW MAC port interface is not reflecting in the ENET_CTRL_MMR register, so if we check for the interface value in CTRL_MMR, it will be the default value only even after driver loading.

    The selection of the MAC Port interface will not reflect in CTRL_MMR because default CTRL_MMR registers are locked, and Linux will not unlock them.

    A lock/unlock register protects multiple CTRL_MMR registers. So Linux cannot add it to individual drivers as that will be racy (against another lock/unlock in a different subsystem).

    In SBL development boot mode or SPL mode, u-boot unlocks all CTRL_MMR registers, so the interface selection for the MAC port will reflect in the ENET_CTRL_MMR register.

    In the case of SBL optimized boot mode where u-boot is not present and SBL loads a Linux image, here no one is unlocking the CTRL_MMR registers, so interface selection to RGMII from ENET_CTRL_MMR is not happening as the registers are locked (write will not affect the register value).

    Due to the above, the interface value in CTRL_MMR is still RMII (the default value), even though the interface is configured as RGMII from device-tree files.

    Error log:

    root@j721s2-evm:~# [   68.163894] ------------[ cut here ]------------
    [   68.168517] NETDEV WATCHDOG: eth0 (am65-cpsw-nuss): transmit queue 0 timed out
    [   68.175762] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x320/0x328
    [   68.184003] Modules linked in: bluetooth ecdh_generic ecc rfkill rpmsg_char crct10dif_ce phy_can_transceiver ti_k3_r5_remoteproc ti_k3_dsp_remoteproc sa2ul cdns_dsi wave5 virtio_rpmsg_bus pvrsrvkm(O) sha512_generic authenc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common cdns_dphy m_can_platform m_can can_dev optee_rng rng_core sch_fq_codel rpmsg_kdrv_switch cryptodev(O) ipv6
    [   68.220208] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.10.162-g76b3e88d56 #7
    [   68.228708] Hardware name: Texas Instruments J721S2 EVM (DT)
    [   68.234349] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
    [   68.240338] pc : dev_watchdog+0x320/0x328
    [   68.244332] lr : dev_watchdog+0x320/0x328
    [   68.248326] sp : ffff800010003db0
    [   68.251627] x29: ffff800010003db0 x28: ffff000002e39940 
    [   68.256922] x27: 0000000000000004 x26: 0000000000000140 
    [   68.262218] x25: 00000000ffffffff x24: 0000000000000000 
    [   68.267513] x23: ffff000002e383dc x22: ffff000002e38000 
    [   68.272809] x21: ffff000002e38480 x20: ffff800011197000 
    [   68.278104] x19: 0000000000000000 x18: 0000000000000010 
    [   68.283399] x17: 0000000000000000 x16: 0000000000000000 
    [   68.288694] x15: ffff8000111a1f10 x14: 00000000000001d5 
    [   68.293989] x13: ffff8000111a1f10 x12: 00000000ffffffea 
    [   68.299284] x11: ffff8000112205b0 x10: ffff800011208570 
    [   68.304580] x9 : ffff8000112085c8 x8 : 0000000000017fe8 
    [   68.309876] x7 : c0000000ffffefff x6 : 0000000000000003 
    [   68.315171] x5 : 0000000000000000 x4 : 0000000000000000 
    [   68.320466] x3 : 0000000000000100 x2 : 0000000000000100 
    [   68.325761] x1 : 07fe66d061f82d00 x0 : 0000000000000000 
    [   68.331056] Call trace:
    [   68.333492]  dev_watchdog+0x320/0x328
    [   68.337144]  call_timer_fn.isra.0+0x24/0x80
    [   68.341312]  run_timer_softirq+0x400/0x438
    [   68.345394]  efi_header_end+0x120/0x268
    [   68.349216]  irq_exit+0xc0/0xe0
    [   68.352345]  __handle_domain_irq+0x68/0xc0
    [   68.356427]  gic_handle_irq+0x58/0x128
    [   68.360160]  el1_irq+0xcc/0x180
    [   68.363291]  arch_cpu_idle+0x18/0x28
    [   68.366853]  default_idle_call+0x20/0x68
    [   68.370762]  do_idle+0xc0/0x128
    [   68.373889]  cpu_startup_entry+0x28/0x60
    [   68.377797]  rest_init+0xd4/0xe4
    [   68.381012]  arch_call_rest_init+0x10/0x1c
    [   68.385092]  start_kernel+0x478/0x4b0
    [   68.388740] ---[ end trace aa61dc5ce5a7d431 ]---
    [   68.393353] am65-cpsw-nuss 46000000.ethernet eth0: txq:0 DRV_XOFF:0 tmo:7716 dql_avail:-38 free_desc:505
    [   74.051898] am65-cpsw-nuss 46000000.ethernet eth0: txq:0 DRV_XOFF:0 tmo:13600 dql_avail:-38 free_desc:505
    [   79.171898] am65-cpsw-nuss 46000000.ethernet eth0: txq:0 DRV_XOFF:0 tmo:18720 dql_avail:-38 free_desc:505
    [   85.059897] am65-cpsw-nuss 46000000.ethernet eth0: txq:0 DRV_XOFF:0 tmo:24608 dql_avail:-38 free_desc:505
    [   89.923899] am65-cpsw-nuss 46000000.ethernet eth0: txq:0 DRV_XOFF:0 tmo:29472 dql_avail:-38 free_desc:505
    [   95.043897] am65-cpsw-nuss 46000000.ethernet eth0: txq:0 DRV_XOFF:0 tmo:34592 dql_avail:-38 free_desc:505
    [  100.163897] am65-cpsw-nuss 46000000.ethernet eth0: txq:0 DRV_XOFF:0 tmo:39712 dql_avail:-38 free_desc:505


    Fix for above:

    Unlock the CTRLMMR registers in SBL.

    Please refer to the below changes and add the patch to <PSDK-RTOS>/pdk/packages/ti/boot/sbl/k3/sbl_main.c, where we are unlocking all CTRLMMR from sbl.
    +#define WKUP_CTRL_MMR0_BASE			0x43000000
    +#define MCU_CTRL_MMR0_BASE			0x40f00000
    +#define CTRL_MMR0_BASE				0x00100000
    
    
    +/*
    + * The CTRL_MMR0 memory space is divided into several equally-spaced
    + * partitions, so defining the partition size allows us to determine
    + * register addresses common to those partitions.
    +*/
    +#define CTRL_MMR0_PARTITION_SIZE		0x4000
    
    +/*
    + * CTRL_MMR0, WKUP_CTRL_MMR0, and MCU_CTRL_MMR0 lock/kick-mechanism
    + * shared register definitions. The same registers are also used for
    + * PADCFG_MMR lock/kick-mechanism.
    +*/
    +#define CTRLMMR_LOCK_KICK0			0x1008
    +#define CTRLMMR_LOCK_KICK0_UNLOCK_VAL		0x68ef3490
    +#define CTRLMMR_LOCK_KICK1			0x100c
    +#define CTRLMMR_LOCK_KICK1_UNLOCK_VAL		0xd172bc5a
    
    
    +void CTRL_MMR_unlock(volatile uint32_t baseAddr, uint32_t partition)
    +{
    +	/* Get the part base address */
    +	uint32_t partBaseAddr = baseAddr + (partition * CTRL_MMR0_PARTITION_SIZE);
    +
    +	/* Unlock the requested partition if locked using two-step sequence */
    +	*(volatile uint32_t *)(partBaseAddr + CTRLMMR_LOCK_KICK0) = CTRLMMR_LOCK_KICK0_UNLOCK_VAL;
    +	*(volatile uint32_t *)(partBaseAddr + CTRLMMR_LOCK_KICK1) = CTRLMMR_LOCK_KICK1_UNLOCK_VAL;
    +}
    +
    +static void SBL_CTRL_MMR_unlock_all(void)
    +{
    +	/* Unlock all WKUP_CTRL_MMR0 module registers */
    +	CTRL_MMR_unlock(WKUP_CTRL_MMR0_BASE, 0);
    +	CTRL_MMR_unlock(WKUP_CTRL_MMR0_BASE, 1);
    +	CTRL_MMR_unlock(WKUP_CTRL_MMR0_BASE, 2);
    +	CTRL_MMR_unlock(WKUP_CTRL_MMR0_BASE, 3);
    +	CTRL_MMR_unlock(WKUP_CTRL_MMR0_BASE, 4);
    +	CTRL_MMR_unlock(WKUP_CTRL_MMR0_BASE, 6);
    +	CTRL_MMR_unlock(WKUP_CTRL_MMR0_BASE, 7);
    +
    +	/* Unlock all MCU_CTRL_MMR0 module registers */
    +	CTRL_MMR_unlock(MCU_CTRL_MMR0_BASE, 0);
    +	CTRL_MMR_unlock(MCU_CTRL_MMR0_BASE, 1);
    +	CTRL_MMR_unlock(MCU_CTRL_MMR0_BASE, 2);
    +	CTRL_MMR_unlock(MCU_CTRL_MMR0_BASE, 3);
    +	CTRL_MMR_unlock(MCU_CTRL_MMR0_BASE, 4);
    +
    +	/* Unlock all CTRL_MMR0 module registers */
    +	CTRL_MMR_unlock(CTRL_MMR0_BASE, 0);
    +	CTRL_MMR_unlock(CTRL_MMR0_BASE, 1);
    +	CTRL_MMR_unlock(CTRL_MMR0_BASE, 2);
    +	CTRL_MMR_unlock(CTRL_MMR0_BASE, 3);
    +	CTRL_MMR_unlock(CTRL_MMR0_BASE, 5);
    +	 #if defined(SOC_J721S2)
    +		CTRL_MMR_unlock(CTRL_MMR0_BASE, 6);
    +	#endif
    +	CTRL_MMR_unlock(CTRL_MMR0_BASE, 7);
    +}
    
    
    int main()
    {
       ......
       
        /* Any SoC specific Init. */
        SBL_SocEarlyInit();
    
    +    /* Unlock control MMR registers */
    +    SBL_CTRL_MMR_unlock_all();
    
        if (SBL_LOG_LEVEL > SBL_LOG_ERR)
        {
            /* Configure UART Tx pinmux. */
            Board_uartTxPinmuxConfig();
        }

    Note: 
    Above changes are for TDA4VL (as a reference), please refer to TRM or U-boot source code (<Linux SDK>/board-support/u-boot-xxxx/arch/arm/mach-k3/j721e_init.c  or j721s2_init.c or j784s4_init.c files) for the CTRL_MMRs of other SOCs.

    After taking the above changes, build "sbl_mmcsd_img_hlos" and "sbl_lib_mmcsd_hlos" and fllow step-5 and step-7 from FAQ [SBL flow with combined app image].

    Best Regards,
    Sudheer