This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: USB device configure cause Linux system hang (2)

Part Number: AM6442
Other Parts Discussed in Thread: TMDS64EVM,

Tool/software:

Starting a new thread since the previous thread is locked.

I'm seeing the same hang when configuring USB gadget. It fails intermittently, but usually within 50 to 200 iterations, the board hangs. Only a power cycle (or watchdog reset) is able to recover.

The problem can triggered by doing while true; do sleep 0.5; echo "" > UDC; sleep 0.5; echo f400000.usb > UDC; done after having configured the USB gadget. The sleep are optional, their presence/absence doesn't seem to change the behavior.

The specific type of USB gadget doesn't seem to matter - both RNDIS and NCM, and even just an ACM port is sufficient to trigger the hang.

Compared with the previous ticket, my differences include:

  • both sides are running Linux (the USB host is a PC with Fedora/Debian)
  • the TI AM64x is running 6.12-rt kernel. Have also tested 6.1 and 6.6, as well as non RT kernel. All fail similarly.
  • I have SR1.0 which is evidently not ideal, however based on previous forum post, SR2.0 is failing in the same way

I have traced and added many printk inside cdns3_gadget_usb_start() and _stop() functions, including printing every register read/write done by cdns3 driver, and interrupt entry/exit. The hang occurs shortly after enabling/disabling the USB pullup, but does not trigger in exactly the same spot each time. Occasionally it triggers while resetting the endpoints. In most cases, the last printk() message is truncated in the middle, suggesting it is hanging while putting data into the UART (or perhaps the UART just stops transmitting). This seems to agree with comments from Bin Liu in the previous forum post.

At this point the only workaround seems to be to rely on watchdog to reset the board in case of hang. I am hoping we can find another solution, but at this point I am not sure what else to look at. Could one of the R cores somehow be interfering? Should we look at voltage rails? I'm using the TMDS64EVM board, but we see the same problem on a custom board with the same SoC.

Any tips for further debugging?

  • Hi Bin,

    Thanks, that indeed looks very promising. It applies cleanly on top of 6.12.y but unfortunately it still hangs in the same way.

    I do not see the dev_err(priv_dev->dev, "Failed to enable fast access\n") being printed, but to be absolutely certain, I added my own printk() after this spot, confirming that fast access is enabled. And curiously, with this extra printk, there seems to be no hang. So perhaps a time delay is needed after enabling fast access? Or might this be another SR1.0 gremlin?

    diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
    index 93d95946fd58..b152303781f1 100644
    --- a/drivers/usb/cdns3/cdns3-gadget.c
    +++ b/drivers/usb/cdns3/cdns3-gadget.c
    @@ -3018,10 +3018,11 @@ static int cdns3_gadget_udc_start(struct usb_gadget *gadget,
                    if (ret) {
                            dev_err(priv_dev->dev, "Failed to enable fast access\n");
                            spin_unlock_irqrestore(&priv_dev->lock, flags);
                            return ret;
                    }
    +               printk("RFS: fast access ok\n");
            }
    
            switch (max_speed) {
            case USB_SPEED_FULL:
                    writel(USB_CONF_SFORCE_FS, &priv_dev->regs->usb_conf);

  • Edit: it hung while I was typing this reply, after more than 1000 iterations.

    And a second run hung after 120 iterations.

    So it seems there is still something else happening...

  • Hi Ralph,

    Can you please test with a different version of the patch attached below?

    diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
    index b1b46c7c63f8..8d9d5be64c5e 100644
    --- a/drivers/usb/cdns3/cdns3-gadget.c
    +++ b/drivers/usb/cdns3/cdns3-gadget.c
    @@ -2970,8 +2970,6 @@ static void cdns3_gadget_config(struct cdns3_device *priv_dev)
     	/* enable generic interrupt*/
     	writel(USB_IEN_INIT, &regs->usb_ien);
     	writel(USB_CONF_CLK2OFFDS | USB_CONF_L1DS, &regs->usb_conf);
    -	/*  keep Fast Access bit */
    -	writel(PUSB_PWR_FST_REG_ACCESS, &priv_dev->regs->usb_pwr);
     
     	cdns3_configure_dmult(priv_dev, NULL);
     }
    @@ -2989,6 +2987,21 @@ static int cdns3_gadget_udc_start(struct usb_gadget *gadget,
     	struct cdns3_device *priv_dev = gadget_to_cdns3_device(gadget);
     	unsigned long flags;
     	enum usb_device_speed max_speed = driver->max_speed;
    +	u32 reg;
    +	int ret;
    +
    +	/*  keep Fast Access bit */
    +	reg = readl(&priv_dev->regs->usb_pwr);
    +	if (!(reg & PUSB_PWR_FST_REG_ACCESS_STAT)) {
    +		writel(PUSB_PWR_FST_REG_ACCESS, &priv_dev->regs->usb_pwr);
    +		ret = readl_poll_timeout_atomic(&priv_dev->regs->usb_pwr, reg,
    +						(reg & PUSB_PWR_FST_REG_ACCESS_STAT),
    +						10, 1000);
    +		if (ret) {
    +			dev_err(priv_dev->dev, "Failed to enable fast access\n");
    +			return ret;
    +		}
    +	}
     
     	spin_lock_irqsave(&priv_dev->lock, flags);
     	priv_dev->gadget_driver = driver;
    @@ -3018,6 +3031,7 @@ static int cdns3_gadget_udc_start(struct usb_gadget *gadget,
     	}
     
     	cdns3_gadget_config(priv_dev);
    +	writel(USB_CONF_DEVEN, &priv_dev->regs->usb_conf);
     	spin_unlock_irqrestore(&priv_dev->lock, flags);
     	return 0;
     }
    @@ -3053,9 +3067,6 @@ static int cdns3_gadget_udc_stop(struct usb_gadget *gadget)
     		priv_ep->flags &= ~EP_CLAIMED;
     	}
     
    -	/* disable interrupt for device */
    -	writel(0, &priv_dev->regs->usb_ien);
    -	writel(0, &priv_dev->regs->usb_pwr);
     	writel(USB_CONF_DEVDS, &priv_dev->regs->usb_conf);
     
     	return 0;
    

  • Hi Bin,

    Many thanks - but it seem this "different version" still seems to hang. In this case after 44 iterations. Here is what I see with tracing enabled for cdns3 and udc events. It doesn't always stop in the same place, but this pattern (3 allocs, for a single ACM gadget) and then usb_gadget_connect, is fairly common.

    44
            test2.sh-148     [000] ...1.    80.229859: cdns3_alloc_request: ep0: req: 000000007ddf24fb, req buff 0000000000000000, length: 0/0 zsi, status: 0, trb: [start:0, end:0], flags:0 SID: 0
            test2.sh-148     [000] ...1.    80.229920: usb_ep_alloc_request: ep0: req 000000007ddf24fb length 0/0 sgs 0/0 stream 0 zsI status 0 --> 0
            test2.sh-148     [000] ...1.    80.229963: usb_gadget_set_selfpowered: speed 0/3 state 0 0mA [sg:bus-powered:activated:disconnected] --> 0
            test2.sh-148     [000] ...1.    80.230032: cdns3_alloc_request: ep2in: req: 000000005304643c, req buff 0000000000000000, length: 0/0 zsi, status: 0, trb: [start:0, end:0], flags:0 SID: 0
            test2.sh-148     [000] ...1.    80.230034: usb_ep_alloc_request: ep2in: req 000000005304643c length 0/0 sgs 0/0 stream 0 zsI status 0 --> 0
            test2.sh-148     [000] ...1.    80.230060: cdns3_alloc_request: ep0: req: 00000000ca66c16d, req buff 0000000000000000, length: 0/0 zsi, status: 0, trb: [start:0, end:0], flags:0 SID: 0
            test2.sh-148     [000] ...1.    80.230062: usb_ep_alloc_request: ep0: req 00000000ca66c16d length 0/0 sgs 0/0 stream 0 zsI status 0 --> 0
            test2.sh-148     [000] ...1.    80.230136: usb_gadget_connect: speed 0/3 state 0 0mA [sg:bus-powered:activated:connected] --> 0
    

  • Hi Ralph,

    The second patch worked for a couple customers which had the same issue. But if it does not resolve the problem for you, we don't have any other sw solution, other than using a watchdog to reset.

  • Hi Bin:

    Ralph's entry does bring up a system concern on how long an AM64x device configured as a USB gadget can run before being compromised, aka "hung". Can this be investigated for the Linux SDK v12 release (with a newer core Linux build)? I am presently relying on to have the AM6442 configured as a USB 2.0 ACM serial gadget @ 480MB/sec in an industrial environment interfacing with a Windows 11 PC

    thanks

    Jim

  • Hi Jim,

    So far we don't observed the issue during the USB gadget controller driver calls its start() and stop() functions, which basically means plug and unplug the USB cable, or Linux on AM64x specifically enable and disable the UDC as Rahph's test script in his initial post above.

    We are still internally investigating the issue, but don't know the root cause yet. At this moment, the workaround is use the watchdog detecting the problem and warm reset the AM64x device and the system.

  • Hi Bin,

    For those customers who reported success with the second patch, do we know more details about their system? Is it the EVM board, or a custom board? Are they running 6.12 kernel, or some other version? Just trying to see what might be different.

    On my side, since my EVM has SR1.0, we are trying to run some tests on another board which has SR2.0, to see if that makes a difference (with the patch applied of course).

    Ralph

  • Hi Ralph,

    I checked the internal records, two customers reported the issue.

    One customer saw the issue on their custom board and the EVM, with SDK v8.6.0.42. The customer used the watchdog reset as the solution.

    The other customer had the issue on their custom board and EVM too, with SDK v9.0 and v9.2. The customer reported back that the second patch resolved this issue for their project.

    Since the issue seems to be in the processor, so I don't think kernel version matters much (if at all). But you might want to test with AM64x SR2.0 to see if you would still see such issue. Nobody should use SR1.0 anyway.

  • Hello Ralph,

    Since you have an EVM with you on which you have observed the issue, can you please apply the following diff which switches from USB 2.0 PHY to USB 3.0 PHY (SERDES)?

    diff --git a/arch/arm64/boot/dts/ti/k3-am642-evm.dts b/arch/arm64/boot/dts/ti/k3-am642-evm.dts
    index f6a76073ec11..cb042c232bd5 100644
    --- a/arch/arm64/boot/dts/ti/k3-am642-evm.dts
    +++ b/arch/arm64/boot/dts/ti/k3-am642-evm.dts
    @@ -615,15 +615,16 @@ &sdhci1 {
     &usbss0 {
     	bootph-all;
     	ti,vbus-divider;
    -	ti,usb2-only;
     };
     
     &usb0 {
     	bootph-all;
    -	dr_mode = "otg";
    -	maximum-speed = "high-speed";
    +	dr_mode = "peripheral";
    +	maximum-speed = "super-speed";
     	pinctrl-names = "default";
     	pinctrl-0 = <&main_usb0_pins_default>;
    +	phys = <&serdes0_usb_link>;
    +	phy-names = "cdns3,usb3-phy";
     };
     
     &cpsw3g {
    @@ -813,23 +814,22 @@ &main_timer11 {
     };
     
     &serdes_ln_ctrl {
    -	idle-states = <AM64_SERDES0_LANE0_PCIE0>;
    +	idle-states = <AM64_SERDES0_LANE0_USB>;
     };
     
     &serdes0 {
    -	serdes0_pcie_link: phy@0 {
    +	serdes0_usb_link: phy@0 {
     		reg = <0>;
     		cdns,num-lanes = <1>;
     		#phy-cells = <0>;
    -		cdns,phy-type = <PHY_TYPE_PCIE>;
    +		cdns,phy-type = <PHY_TYPE_USB3>;
     		resets = <&serdes_wiz0 1>;
     	};
     };
     
     &pcie0_rc {
    -	status = "okay";
    +	status = "disabled";
     	reset-gpios = <&exp1 5 GPIO_ACTIVE_HIGH>;
    -	phys = <&serdes0_pcie_link>;
     	phy-names = "pcie-phy";
     	num-lanes = <1>;
     };

    With the above changes, I no longer saw a hang even after around 20 minutes. Please check and let me know. While the above isn't a fix, it will help us confirm if the issue being observed has something to do with the usage of USB 2.0 PHY.

    Regards,
    Siddharth.

  • Hi Siddharth,

    I've tested your patch (USB 2.0 PHY to USB 3.0 SERDES) on my EVM with SR1.0, and it seems to help: so far I have over 3000 iterations  (connect/disconnect) without any hang, the test is still running.

    Meanwhile, my client who has SR2.0 on their custom board reports that they cannot use this patch, as their hardware is limited to USB 2.0. However, they did have success with the previous patch (v2 from Bin) on their SR2.0 hardware.

  • Hi Ralph,

    Great to hear the positive news. Looking forward to the final test result.

    I believe on your client board this new patch can still be tested even the board design limits to USB 2.0, as long as the PCIe interface is not used which would need the SERDES.

  • Hi Bin,

    The client board (with SR 2.0) apparently has USB isolation, when they tried the USB 3.0 patch, the USB was non-functional. But they are having success with the your earlier patch ("different version" above).

    Meanwhile on my EVM with SR1.0, the USB 3.0 patch seems to do the trick. It has run 75,000 iterations for nearly 24 hours now. I also notice that the host side still reports high-speed device detected, and not super-speed.

  • I also notice that the host side still reports high-speed device detected, and not super-speed.

    Are the Serdes super-speed data lines routed to the USB connector on your board? Do you use a super-speed USB cable?

  • Are the Serdes super-speed data lines routed to the USB connector? Do you use a super-speed USB cable?

    I don't have detailed schematics, but I suspect the answer is "no", based on the diagram at https://www.ti.com/document-viewer/lit/html/SPRUJ63A#GUID-1736BCF0-3226-43BC-A347-DE8F1364DAAB/TITLE-SPRUIM7T5586720-1

    Also the connector on the EVM only has 5 pins, so it doesn't include the diff pairs for super-speed.

  • Oh, I didn't realize you worked on the EVM (though you mentioned it in your first post above).

    Yes, the EVM usb port is for USB2.0 only. It makes sense the host can only detect the EVM as high-speed.