This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

USB0 PLL frequency drift across temperature swings

At low temperature we systematically ran into this issue (Advisory 3.0.24) documented on p. 23 of the silicon errata of the OMAP L137 which we simply fixed for now by asserting the reset of the USB0 PHY for 500ms every minute. We are now looking into a more efficient way to implement the workaround into the Linux driver but it’s not clear what would be the proper parameters mainly:

  • Width of the reset pulse. We tested 1ms as coded in the errata on p. 24. That wasn’t enough. 500ms (as commented) did the trick. The technical reference manual says a few clock cycles (p. 1428) but that’s the only specification we could find.
  • For how long shall we wait after asserting the reset before checking again for the condition

  • Hi Louis,

    In which clock frequency your core is operating ?

    Are you getting any kernel error messages from USB driver when this situation get occur ?

  • Hi Titusrathinaraj,

    The core is operating at 400MHz. The Linux root file system is mounted on a secure digital medium itself accessible over USB through a USB2640i controller so a bunch of IO errors are outputted by the kernel whenever the medium is accessed. We confirmed that the issue doesn't come from the secure digital medium nor its connector. The hub section of the USB2640i could have been the issue but implementing the workaround from the silicon errata did fix the issue so it looks like we are subject to advisory 3.0.24.

  • Hi Louis,

    Louis said:

    bunch of IO errors are outputted by the kernel whenever the medium is accessed.

    Could you please attach the error log (error print while USB needs reset)

    I think, we can try to implement the "USB reset" code snippet in USB driver (as per errata advisory) after received the USB error. In that we have to check the condition, whether the USB error occurs really due to "low temp" issue or other than else, if it is due to "temp" issue then we have to call "usb_reset" function to reset the PHY.

    Currently, how did you implement the fix in driver ?

  • I was in the process of doing precisely that. Right now it's just a simple driver called every 60 seconds from the user space. When called the driver asserts the PHY reset for 500ms by setting then clearing the 9th bit of the CFGCHIP2 register.

    It's not very clean. It was just to verify that it was indeed the issue documented by the errata.

    I'll try to provide a representative logs including errors thrown by the USB driver, not the higher levels. It takes a little while to expose the unit to the temperature swing.

  • Hi Louis

    What is the status on this issue. The field team escalated this thread. 
    Please let us know what additional clarification you were looking for on this?

    The errata mentions a 500 msec time between the PHY reset call

    Void phy_reset(void) {
    CFGCHIP2 |= USBPHY_PHYPDWN; /* Power down the USB PHY */
    mdelay(1); /* Wait 500ms */
    CFGCHIP2 &= ~USBPHY_PHYPDWN; /* Power up the USB PHY */
    }

    So this is the right/expected time you should assert /deassert (not 1 msec). 

    Is there any other specifics that you need with respect to USB PHY reset timings?

    Regards

    Mukul 

  • Additionally in terms of implementing work-arounds, I think different customers have implemented different schemes. 
    Another one that we decided not to publish in our errata but exist on another device errata is can be found here

    http://www.ti.com/lit/er/sprz294e/sprz294e.pdf

    If by any chance your board has a temperature sensor on board, you could possibly use that for when to initiate these PHY reset, if doing periodically is not a viable option for you. 

  • Thanks Mukul,

    The comment says 500ms but the code itself (mdelay(1);) generates a 1ms delay (same for sprz294e actually). Our setup is a little tricky as the root file system is accessible over the USB link so what applies to other customers might not for us and although most of time the drivers at higher level are able to buffer and compensate for the USB re-enumeration it's not always the case. Also note that under normal conditions up to 35 seconds can pass on our setup without seeing any new USB interrupts so the details provided sprz294e on p.18 (2-5 seconds without seeing new interrupts) might not work for everybody.

    We were in the process of implementing everything in the USB driver itselft while ensuring USB activity at low level to have continuous interrupt flow but we decided against it and we'll rather try to move the root file system on another medium keeping on the USB medium only what is accessed sporadically under application control in between re-enumerations. 

    Still we'd like to get the timing right before trying to mount anything from that medium. As given here I guess we should wait to ensure that the PLL clock is stable before allowing these operations? Anything else we should check or do after the reset?

  • Hi Louis

    Thanks for the additional information. I looked through the logs on this errata from and did not find anything that says that we need to wait 500 ms between PHY power down and power up. The mdelay of 1 msec should be sufficient. 

    After the phy reset, you can go through that standard initialization and re-enumeration process as listed in the user guide or code snippets you showed. Polling for CFGCHIP2.USB0PHYCLKGD bit ensure that the PHY is properly clocked and USB PLL Is locked. 

    Hope this helps.

    Regards

    Mukul 

  • Hi Mukul,

    In our case 1ms isn't enough but 500ms fixed the issue.

    Louis

  • Hi Louis

    Can you share your code snippets, including the code used to init USB post reset? Do you wait for PHYCLKGD in both cases (1 ms and 500 ms)?

    Regards

    Mukul

  • The root file system doesn't rely on the USB Link anymore but still it doesn't look like there's anyway to recover the USB link besides a reboot (soft). A one point during the temperature swing the kernel will issue:

    usb 1-1.1: USB disconnect, address 3
    usb 1-1: reset high speed USB device using musb_hdrc and address 2
    usb 1-1: device not accepting address 2, error -71
    usb 1-1: reset high speed USB device using musb_hdrc and address 2
    usb 1-1: device not accepting address 2, error -71
    usb 1-1: reset high speed USB device using musb_hdrc and address 2
    usb 1-1: device descriptor read/64, error -71
    usb 1-1: device descriptor read/64, error -71
    usb 1-1: reset high speed USB device using musb_hdrc and address 2
    usb 1-1: device descriptor read/64, error -71
    usb 1-1: device descriptor read/64, error -71
    hub 1-1:1.0: hub_port_status failed (err = -19)
    hub 1-1:1.0: hub_port_status failed (err = -19)
    hub 1-1:1.0: hub_port_status failed (err = -19)
    hub 1-1:1.0: activate --> -19
    usb 1-1: USB disconnect, address 2
    usb 1-1: new high speed USB device using musb_hdrc and address 4
    usb 1-1: device not accepting address 4, error -71
    usb 1-1: new high speed USB device using musb_hdrc and address 5
    usb 1-1: device not accepting address 5, error -71
    usb 1-1: new high speed USB device using musb_hdrc and address 6
    usb 1-1: device descriptor read/64, error -71
    usb 1-1: device descriptor read/64, error -71
    usb 1-1: new high speed USB device using musb_hdrc and address 7
    usb 1-1: device descriptor read/64, error -71
    usb 1-1: device descriptor read/64, error -71
    hub 1-0:1.0: unable to enumerate USB device on port 1

    After which the PHY reset is asserted:

    			*(gCFGCHIP2) |= USBPHY_PHYPDWN; /* Power down the USB PHY */
    			mdelay(500); /* Wait 500ms */
    			*(gCFGCHIP2) &= ~USBPHY_PHYPDWN; /* Power up the USB PHY */
    			printk(KERN_ALERT "USB Reset\n");

    and USB Reset does show up in the kernel logs but I can't get it to re-enumerate the usb devices. But I found a way to reproduce the condition at room temperature:

    echo 0 > /sys/devices/platform/musb_hdrc/usb1/authorized
    echo 1 > /sys/devices/platform/musb_hdrc/usb1/authorized

    it kind of looks like an issue with mdev used along TI's driver so this might not be the proper thread. Still what would be the recommended procedure to force re-enumeration?