This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM37x EHCI Controller Lockup

Other Parts Discussed in Thread: TPS65921, TUSB1210

SW is reporting an EHCI lockup event where the PORTSC_I register bit 8 shows the port is in reset and will not come out. Here is the dump from our SW guy...

OK, I found the lockup. It is in USB EHCI controller at address 0x4806 4858.

The current register setting is shown highlighted in yellow below

___address____|________0________4________8________C_0123456789ABCDEF

 

  NSD:48064830| 00000000 00000000 00000000 00000000 ................

  NSD:48064840| 00000000 00000000 00000000 00000000 ................

  NSD:48064850| 00000001 00001000 00001105 00001000 ................

  NSD:48064860| 00000000 00000000 00000000 00000000 ................

  NSD:48064870| 00000000 00000000 00000000 00000000 ................

  NSD:48064880| 00000000 00000000 00000000 00000000 ................

  NSD:48064890| 00000000 00200040 00000080 00000001 ....@. .........

 

The code that tests the problem bit is in yellow below. FYI – this is the HCC Stack code, not the application code

 

_addr/line__|code_____|label___________________|mnemonic________________|comment

NSR:80027214|E3A02C01                           mov     r2,#0x100

NSR:80027218|E1A01007                           cpy     r1,r7

NSR:8002721C|E1A00008                           cpy     r0,r8

NSR:80027220|EBFFFFC2                           bl      0x80027130       ; _ehci_set_portsc_bit

         135|      oal_task_sleep( 50 );                     /* wait 50ms for the completion of the rese

NSR:80027224|E3A00032                           mov     r0,#0x32

NSR:80027228|EB00DC95                           bl      0x8005E484       ; oal_task_sleep

         136|      _ehci_clear_portsc_bit( unit, port, PR ); /* clear reset - required in some systems

NSR:8002722C|E3A02C01                           mov     r2,#0x100

NSR:80027230|E1A01007                           cpy     r1,r7

NSR:80027234|E1A00008                           cpy     r0,r8

NSR:80027238|EBFFFFB5                           bl      0x80027114       ; _ehci_clear_portsc_bit

            |

            |      /* wait reset to complete */

            |      do

            |      {

         141|        rc = ehci_hc_state( unit );

NSR:8002723C|E1A00008___________________________cpy_____r0,r8

NSR:80027240|EBFFF9FF                           bl      0x80025A44       ; ehci_hc_state

NSR:80027244|E1B04000                           movs    r4,r0

            |      }

         143|      while ( rc == USBH_SUCCESS && _ehci_gb32_reg( PORTSC( port ), PR ) );

NSR:80027248|1A000010                           bne     0x80027290

NSR:8002724C|E5950850                           ldr     r0,[r5,#0x850]

NSR:80027250|E3100C01                           tst     r0,#0x100  this line of code is testing bit 8 of the USB portsc_i register

NSR:80027254|1AFFFFF8                           bne     0x8002723C

            |    }

 

 And I finally took screen shots from the reference manual showing the port is in reset as per the register setting (see bit 8 below). Just a bit of info, when we originally put this USB stuff together this was due to the ULPI/PHY not behaving. I do not know if that is the problem today as it could also be the different HUB that has been used. I tried terminating the reset by writing a zero but the reset does not clear.

The screenshot will not copy into here, but it's just the register description from the reference manual. The PHY we are using is the USB3320 and the Hub is the USB2512 both from SMSC.

We have come to the point where help is needed. I have looked for HW issues with no luck so far. It seems to be internal to the 3703 but of course could still be hardware. Can someone help?

  • I will ask the USB experts to look at this. Feedback will be posted directly here when available.
  • I’ve reviewed the trace and have some questions:

    1)      HighSpeed enumeration failed. Are you limiting device operation to FullSpeed purposely? If not, this could be indicative of a signal integrity issue.

    2)      Which OS+Driver are you using?

    3)      Can you provide a USB topology overview?

     

    From the trace I see that the controller successfully configures the hub and can communicate with both EP0 (Config) and EP3 (Out). Attempts to communicate with EP1 and EP2 are 100% NAK’ed and eventually both time out. The controller resets the bus in an attempt at recovery, but the hub fails to re-enumerate. I'm guessing at this point the hub is in the weeds.

    1. For this trace, we are connecting to a full speed device, but we are not limited to full speed.
    2. Quadros RTXC OS and HCC Stacks/Drivers.
    3. AM3703-->SMSC USB3320 PHY-->SMSC USB2512 HUB

  • Jerry,
    Why is the PHY->HUB link segment FS? I would expect a HS-capable hub to connect at HS regardless of what is connected downstream (unless HS is specifically disabled in the hub).

    1) Have you tried plugging in different devices to see if the issue remains?
    2) Which AM37xx port are you using, and what is connected to the others?
    3) Have you read the USB-specific Advisories in the errata and implemented the recommendations contained therein? Advisory 2.1 is of special concern.
  • The PHY-->HUB link segment is not FS. I said I'm connecting to a FS device. In this case it's a handheld scanner.

    1. Our SW guy is plugging in different devices, but the problem exists even without a device plugged in.
    2. We are connected to Host Port 2 for this issue. Host port 1 isn't used for USB, it's GPIO mostly. Port 0 is connected to the TPS65921 PMIC PHY as OTG.
    3. I'll send that off to our SW guy.
  • Ahh, I see. I misunderstood and thought the trace was from further upstream. This means than the hub was successfully enumerated, and that the downstream device (scanner) was also successfully enumerated. The failure occurs at the first attempt from the Host driver to read data from the scanner *after* enumeration. It's also important to note that the first write transaction to the scanner is successful so that part of the equation is OK. My hypothesis at this point would be that the Host/Scanner driver doesn't know how to talk to the scanner...it's asking for data in a format the controller either doesn't understand, or the scanner simply doesn't have data to return. The PR bit is controlled by SW so it seems the first task would be to determine which driver is setting this bit, and why.

    I don't see anything that would indicate a controller issue. Everything is perfect from a protocol perspective.

    It would be helpful to see a protocol trace of a HS device in the same scenario as the FS device you provided earlier to further verify upstream signal integrity and Host functionality. I'd like to see that the Hub did indeed enumerate at HS.
  • We don't support any HS devices, but would it help to see a trace between the PHY and Hub?

  • Jerry,

    I don't see any value in that at this point. The USB has no errors (protocol or otherwise) downstream of the hub, so I wouldn't expect to see any upstream of the hub.

    - Since you don't support HS devices, it might be useful to limit the hub to FS operation. I believe this can be done via Hub EEPROM.

    - I'd verify that the hub is not entering Suspend, at least for the purposes of this debug.

    FWIW, the BeagleBone xM uses the same PHY and a SMSC USB hub so it may be useful in your testing/SW development.

  • Ok, let's back up a bit here so I can remind you of the issue we are facing according to our SW guy. He says that when the EHCI locks up, the PR bit in the PORTSC_i register is stuck and won't clear. According to SMSC, they have a workaround for an errata they published that sounds similar to what is happening. They also claim that it solves Advisory 1.64 in your errata. I've attached their errata so you can look at it(it's Module 4).

    Would their solution solve our lock up issue?

    80000648A.pdf

  • Jerry,

    It's possible that the SMSC erratum work-around could resolve the lock-up issue you are seeing. We've seen this issue before with the 332x, but the circumstances were a bit different from your case.

    In my last post I suggested that you limit the SMSC hub to FS mode via EEPROM as this would prevent Step 2 outlined in the errata from occurring. As a FS-only hub, the required HS->FS transition outlined in the SMSC doc would never happen and the lockup condition described should be avoided.

  • More info...I reworked a board with the TUSB1210 PHY and we are getting the same failure. SW guy says it is behaving identically to the SMSC PHY. So let's throw out the interoperability issues with the PHY since this is now your PHY. What next because I'm lost?

  • Dave, can you tell me whether this scope shot of the ULPI 60MHz clock looks right? 425mV pk-pk and it's centered up around 1V. That's doesn't look right to me.

  • Dave,

    Ignore that scope shot. I had the high-frequency filter on. It looks fine now.

    So I still have this failure where the EHCI port is stuck in reset. Looking for ideas to get to root cause.

  • Jerry,
    OK, we can move further upstream:

    1) Were you able to verify that the AM37xx errata were understood and all work-arounds implemented? (See my post of 10/24 #3). Last update had your SW guy looking into it.
    2) Please provide the TI schematic review ID# for this board design.
    3) Please provide your ULPI timing analysis information for this board design.
  • Cirrus ULPI Study 1.xlsx1) Yes, SW claims to have implemented the workarounds. Although the SMSC PHY stuff no longer applies as I've reworked boards with TI PHYs and we still have the problem

    2) I don't know what that means.

    3) Attached

  • More info...I just took my board and changed the boot order from SYS_BOOT[6:0]=0110001 to SYS_BOOT[6:0]=0010001 and the USB device port stopped working. This is connected to HSUSB port 0 through the PMIC PHY. I then put the boot order back to the original and the USB device port began working again.

    1. Why did this happen?

    2. Is this possibly related to my EHCI lockup issue?

  • Dave,

    I just talked with Gunter and we want to back up here and be sure that everyone understands the failure modes and what we have done so far.

    There are two failure modes we have seen.

    1. The EHCI locks up when warm resets are issued in an iterative test.

    2. The USB Host port fails to enumerate about half the time when powering up(cold resets).

    What we've done so far...

    1. The EHCI lock-up has not been affected by any HW or SW changes that we've made so far. The iterative test issues warm resets until the failure happens and the frequency is random but usually takes 20-80 warm resets before the lock-up occurs. When a lock-up occurs, we see that the USB D+ line simply goes high and then no communication is attempted by the host but the ULPI Clock is running normally. It looks like the host is in the lock-up condition before enumeration would have been attempted.

    2. The USB host port failure on power-up looks to have been solved by changing the SMSC PHY to the TI PHY. The USB trace that was attached in an earlier post was of this failure and that was with the SMSC PHY. I will be doing more power-up tests to be sure that this failure is solved, but there have been none over the last 3 days(maybe 30 power-ups).

  • Further info...Late last Friday, I was able to get an EHCI lock-up from a cold reset. It took maybe 25 resets to get it, so that is about the same frequency of lock-ups as warm resets.

  • Some more information from our SW guy...

    a bit more info about the hang-up

    I traced the startup code and found that the routine that is hanging is called on USB stack initialization.

    I see the startup logic is resetting the root hub/port. This is accomplished by:

    1)      Checks to see that root hub is not in the suspended state or has not been halted. If the root hub is in either of those states the reset logic is aborted.  (we are not seeing this)

    2)      The stacks are blocked for 200mS waiting on the power to stabilize (is this long enough?). Question: What is the PMIC doing when the error occurs? Are all the power domains up and stabilized? I ask this because the processor is reset but the PMIC is not reset, but it is re-configured. The same command are sent to the PMIC regardless if a power up or warm reset.

    3)      The USB port is disabled

    4)      There are a couple of checks to determine if the reset is allowed – I do not know what this is for as there are no comments but we are not seeing this as the hang-up occurs only if the reset is allowed

    5)      The port is reset

    6)      The stacks delay for 50mS

    7)      The port reset is de-asserted

    8)      The code falls into a loop awaiting the reset to complete. In most cases all goes as expected but every so often taking the port out of reset DOES NOT WORK! Furthermore, when in the error condition and I try to manually clear the reset bit in the processor register the reset bit just does not go to the cleared state.

  • Jerry,

    So if the reset cycle is successful, does the interface always work as expected?

  • Could you modify the loop timing to 1s (from 200ms) to see if the behavior changes?
  • Hi,

    The customer increased the time to 1 sec and it did not change the behavior.

    Regards,
    --Gunter
  • Just updating the thread to reflect the fact that we are actively trying to repro this internally. I'll post here with updates as they become available.