This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM3352ZCZ USB Problem

Other Parts Discussed in Thread: AM3359

Hello folks,

we detected an odd behavior on the USB communication between our AM3352ZCZ (HS USB Host) board and an Atmel microprocessor (HS USB Device).

The differential couple's lenght is 55mm (very short) and on the bus there are only the protections and male/female USB connectors.

It happens that we observe randomly (with intervals of 20-30s) that a single packet is lost and the bus is locked.

If a USB cable (1.5m in lenght) is introduced, the problem disappears (!!!) and no packet is lost.

 

The document SPRZ360C (AM335x ARM Silicon Revision 1.0 Silicon Errata) describes a similar problem,

Advisory 1.0.11: USB: Attached Non-compliant USB Device that Responds to Spurious Invalid Short Packet May Lock Up Bus.

Because there is a timing error on the TI USB PHY and an invalid short packet is sent, maybe that this lock the bus.

 

The question is:

why the problem vanishes with a USB cable in between? Maybe that is the introduced delay time helping? Or maybe am I completely astray?

Thanks in advance for your help.

Regards,

Mauro Cleris

 

 

 

 

 

 

 

 

 

 

 

  • Hi Mauro,
     
    The Errata discusses the AM335X PHY receiver catching the end of it's own transmit data. I think you will need to analyze waveforms on the USB lines, and maybe use an USB data analyzer to locate what's happening. You might also check if the same problem exists with different USB devices and/or different lengths of USB cable.
     
    Best Regards
    Biser
  • We are experiencing the same behavior on USB port1. Short connections cause USB data transfer lock-up. Our USB conection is intended to be comprised of PCB traces only. I.e the USB host and device are on the same PCB. Adding a length of cableto the port 1 USB connection fixes the lock-up. This behavior is observed when connected to more than one USB device. i.e. connections to different controllers exhibit the same problem. The lock-ups may be due to Advisory 1.0.11 bug, however, it is unlikely that the only two controllers we connected two happened to issue the "non-compliant" response to the "invalid short packet". Additionally, silcon containing the advisory 1.0.11 bug should favor short/fast physical connections and we observe the opposite. We even tried connecting the USB device directly to the processor through short (< 0.3 inches) wires. It is unlikely that signal integrity is an issue with such short connections. We do not observe lock-ups on USB port1, but the PCB traces are longer on that port and we did not try a short connection there. Please advise. BTW, we are using "Experimental" Rev 1.0 parts. P/N XAM3359ZCZ. Also, please tell me if Advisory 1.0.11 is corrected on newer revision parts.

  • This issue described in Advisory 1.0.11 has never been observed on AM335x. This advisory was included as a precautionary measure based on the fact that the issue was observed on another device that shares the same PHY. It's important to note that even on the affected device, a 'perfect storm' of related problems had to be present before the issue described in the advisory would occur, these being a poor board layout resulting impedance discontinuities between DP/DM, pre-production parts that had not been trimmed to account for manufacturing variances, and the improper use of series termination resistors on the USB DP/DM lines.

    The usage scenario you describe (short USB traces with no physical connector) is very common and we have several customers in mass-production with no problems reported. BeagleBone is also one such device. The fact that your issue is resolved by adding trace length points to a signal integrity issue, and my first inclination would be to check for impedance discontinuities (90 Ohm differential is required) or improper series termination of the DP/DM lines.

  • Hmm.... If the issue has never been observed on the AM3359, why is the advisory checked off in the Rev 1.0 column of the Silicon Revision Affected part of the Silicon Errata Document? - and why is the  Rev 2.0 Column of the Silicon Errata Document marked as TBD? The fact that the problem has not been observed does not mean that it wont especially since the AM3359 Shares the same PHY with a device that DOES exhibit the problem.

    Your post indicates that I should check my impedances, which I have done several times. You state that part of the "perfect storm" is the apparent fact that pre-production parts have not been trimmed. We are in fact using pre production parts. XAM3359ZCZ to be exact.  How far off in value can the untrimed USB terminations be?

     Additionally, where is the fact that the pre-production parts have not been trimed documented? It it necessary for the calibration-related register settings be treated specially? and most importantly, can the error be compensated by changing register settings?

    As far as your statement goes, "The fact that your issue is resolved by adding trace length points to a signal integrity issue".

    I'd like to reiterate that the USB bus locks up when operating over very short direct connections to the external USB device and we tried more than one type of device. In this case, SI should be excellent as long as the on-die terminations are accurate. Adding a cable should make the SI worse. In reality adding cable makes the performance BETTER.

     

     

     

  •  

    I'd also like to state that if Advisory 1.0.11 is due to a timing problem, post-layout timing analysis of the device should indicate whether the problem exists or not. Can you tell me if worst-case timing is met on XAM3359ZCZ devices or not? If not, the unwanted behavior detailed in the advisory can occur. Correct?

     

     

     

  • Allow me to respond inline (below):

    Hmm.... If the issue has never been observed on the AM3359, why is the advisory checked off in the Rev 1.0 column of the Silicon Revision Affected part of the Silicon Errata Document? - and why is the  Rev 2.0 Column of the Silicon Errata Document marked as TBD? The fact that the problem has not been observed does not mean that it wont especially since the AM3359 Shares the same PHY with a device that DOES exhibit the problem.

    [DK] As I mentioned in my initial response to the thread, we listed the errata purely as a precaution given that the issue was seen on another, similar, device. This issue has not been seen on any variation of the AM335x device. As Rev 2.0 had several changes to the die itself, we thought it prudent to leave the advisory as-is until we were positive that it too would not be affected.

    Your post indicates that I should check my impedances, which I have done several times. You state that part of the "perfect storm" is the apparent fact that pre-production parts have not been trimmed. We are in fact using pre production parts. XAM3359ZCZ to be exact.  How far off in value can the untrimed USB terminations be?

     Additionally, where is the fact that the pre-production parts have not been trimed documented? It it necessary for the calibration-related register settings be treated specially? and most importantly, can the error be compensated by changing register settings?

    [DK] All AM335x devices are trimmed prior to release to the market. The untrimmed, pre-production parts mentioned earlier were on a different device that did exhibit this issue, not AM335x.

    As far as your statement goes, "The fact that your issue is resolved by adding trace length points to a signal integrity issue".

    I'd like to reiterate that the USB bus locks up when operating over very short direct connections to the external USB device and we tried more than one type of device. In this case, SI should be excellent as long as the on-die terminations are accurate. Adding a cable should make the SI worse. In reality adding cable makes the performance BETTER.

    [DK] Short traces do not guarantee signal integrity, and adding to the total trace length can influence the susceptibility of a device to reflections caused by an impedance mismatch (among other things). I read that you have checked your impedance, have you by chance run an eye-diagram test for the interface? 

  • Thank you for your response. If advisory 1.0.11 was included as a precautionary measure, why is the Advisory status for Rev 2.0 shown as TBD?  Anyway, I have checked the USB eye pattern at the processor and when running at 480 Mbps, it is compliant with the USB 2.0 far-end mask. The behavior that is not consistent with signal integrity as a cause, is the fact that the USB lock-ups do not occur after resetting the link by shorting DP to DM (the USB spec states that this is acceptable practice). In other words, to get the link to operate properly, either a cable must be added, or the link must be physically reset. Could this be an issue with calibration of the PHY? i.e. do the PHY calibrations have to be initiated by the firmware or do they take place automatically? (We are operating with a new USB device driver, and it is possible that the USB PHY is not being initialized or handled properly). We have extensively verified the fact that adding a cable and/or shorting DM to DP reduces the USB lock-ups dramatically and that shortening the connection between the processor and the USB device increases them.

  • Still waiting for a response.

  • Is it running Linux and using the SDK kernel release by TI? Does the USB device use a standard USB class?

    I am wondering if the PHY setup/init in your OS is not done properly if it is not TI SDK kernel.

  • Thank you for your response Bin.We are running Green Hills with a new USB driver and will verify the handling of  the PHY related registers.

  • Hi there,


    just wondering if there was any resolution to this problem as we also have a custom AM3359 board exhibiting the same problems on USB0.


    The board has an FTDI4232 USB quad serial adapter on board. The DP/DM traces are 4 inches long and are routed mostly on an internal layer.


    The signal integrity looks good and the differential impedance is correct.


    When connecting an external device to the board via the USB host connector connected to USB1 all is fine.


    Some odd things to not are that:

    Failures _always_ occurs during the serial port open or close calls. If a port is successfully opened communication never fails.

    If I change the kernel driver to force enumeration at 12Mb, the open close failures disappear but enumeration at boot sometimes fails. So it seems that the problem moves with 12Mb enumeration.

    The problem can be reproduced with a python script that simply opens and closes one of the USB serial ports.

    You might not notice this problem unless you were looking for it as it is fleeting but reliability is very important for this board.

    Thanks,
    Mike Cruse

  • Just a quick clarification, the external device I was referring too used the same FTDI part as the one used on board.

  • A little more information...


    We are running kernel version 3.12.6 with the latest bone patches from Robert Nelson. I have also tried using the musb driver from the TI SDK. No change in behavior.


    While running the USB serial port open/close test the musb driver is occasionally logging these messages:

    Jan 03 14:00:14 pkg kernel: musb_host_rx 1738: Rx interrupt with no errors or packet!
    Jan 03 14:00:31 pkg kernel: musb_ep_program 897: broken !rx_reinit, ep2 csr 0003

    These messages do not seem present when the USB subsystem stalls. At the time of the stall the following messages are logged:

    [  104.860421] ftdi_sio ttyUSB1: urb failed to clear flow control             
    [  114.868437] ftdi_sio ttyUSB1: failed to get modem status: -110              
    [  119.876422] ftdi_sio ttyUSB1: error from flowcontrol urb

    We tried shorting DP/DM as mentioned by Chest Jual above and indeed the USB bus does not fail with the on board part after that. I tried this many times to be absolutely sure. The board under test will fail withing about 20 open/close test cycles. But once the failure occurs, shorting DP/DM will cause the the FTDI part to be disconnected and then re-enumerated. After that point a million test cycles will complete successfully time after time.

    Is there some re-calibration occurring after shorting DP/DM that does not happen during boot?

    Is there a simple way to dump the PHY registers to see what might have changed?


    Any help would be greatly appreciated.

    Regards,


    Mike Cruse

  • One more thing...

    I also tried resetting the the USB device, both under software control and manually asserting the FTDI reset line when the problem occurred.

    This was done in the hope that I could achieve similar results (i.e. no issue afterwards) to the shorting of DP/DM.


    In both cases the FTDI device disconnected and re-enumerated but the problem still persists afterwards.

    So there seems to be something good happening specifically, and only with shorting DP/DM.


    What could that be?

  • I have done a little more poking around. I am not sure if this is useful but here is a dump of the PHY registers for USB PHY0

    Phy0 Termination: 00000040
    Phy0 RX_CALIB:    00000056
    Phy0 DLLHS_2:     0000001f
    Phy0 RX_TEST_2:   00000034
    Phy0 CHRG_DET:    00000000
    Phy0 PWR_CNTL:    00000040
    Phy0 UTMI_1:      00000000
    Phy0 UTMI_2:      00000000
    Phy0 BIST:        00000000
    Phy0 BIST_CRC:    000000ff
    Phy0 CDR_BIST:    00000000
    Phy0 GPIO:        00000000
    Phy0 DLLHS:       00000000
    Phy0 CM_TRIM:     00000050
    Phy0 CM_CONFIG:   00000002
    Phy0 USBOTG:      000000c0
    Phy0 AD_1:        00000045
    Phy0 AD_2:        00000081
    Phy0 AD_3:        00000000
    Phy0 ANA_CONF1:   00000000
    Phy0 ANA_CONF2:   00000000

    I dumped these registers before and after the shorting of DP/DM fix and there were now real changes. Both Termination and RX_TEST_2 jump around a bit but all the other registers are the same before and after.

    I can reproduce the removal of the problem after shorting DP/DM on multiple boards now.

    We may have a signal integrity problem but I would sure like to know why this operation helps.


    Is there anything else I can look at?


    Thanks,


    Mike