This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DP83825I: DP83825I PHY Link Hang after Stress Test (OMAP-L138/DaVinci EMAC) – Requires Software Reset to Recover

Part Number: DP83825I
Other Parts Discussed in Thread: OMAP-L138

Hi everyone,

I am experiencing a persistent "link hang" issue with the DP83825I PHY connected via RMII to an OMAP-L138 (davinci_emac).

The Issue: During a stress test involving repeated hot-plugging (plug/unplug) of the Ethernet cable, the link eventually enters a state where it is permanently "Down" (carrier 0). The system stays in this state indefinitely, even if a known-good cable is reconnected.

Recovery: The only way to recover the link is to manually perform a Digital Soft Reset via MDIO: phytool write eth0/0/0 0x8000  Once this reset is issued, the link immediately comes back up and DHCP completes successfully.

Register Observations during the "Hang" state:

  • REG 0x01 (BMSR): 0x7849 (Link is down, but Auto-Neg is not complete).

  • REG 0x10 (PHYSTS): 0x1912 or 0x0812. Note that Bit 12 (Signal Detect) is often 1, but Bit 0 (Link Status) remains 0.

  • Hello,

    This is unexpected. How often is the PHY getting hot plugged? I have a concern that the state machine may be stuck, but this can be due to other factors such as a glitch on the reference clock. During the hang state, is Reg 0x5 being populated?

    Also what is the link partner in this case? Does this occur with other link partners or with another DP83825 as a link partner?

    Sincerely,

    Gerome

  • HI Gerome, 

    I have captured a register dump at the exact moment the link hangs during a rapid plug/unplug stress test. I'm connected to the router via the RJ45.

    For Good Case:

    REG 0x00: 0x3100
    REG 0x01: 0x786d
    REG 0x02: 0x2000
    REG 0x03: 0xa140
    REG 0x04: 0x01e1
    REG 0x05: 0xcde1
    REG 0x06: 0x000f
    REG 0x07: 0x2001
    REG 0x08: 0000
    REG 0x09: 0000
    REG 0x0a: 0x0100
    REG 0x0b: 0000
    REG 0x0c: 0000
    REG 0x0d: 0x4007
    REG 0x0e: 0000
    REG 0x0f: 0000
    REG 0x10: 0x4615
    REG 0x11: 0x0108
    REG 0x12: 0x6400
    REG 0x13: 0x2800
    REG 0x14: 0000
    REG 0x15: 0000
    REG 0x16: 0x0100
    REG 0x17: 0x0065
    REG 0x18: 0x0480
    REG 0x19: 0x8c00
    REG 0x1a: 0000
    REG 0x1b: 0x007d
    REG 0x1c: 0x05ee
    REG 0x1d: 0000
    REG 0x1e: 0x0102
    REG 0x1f: 0000

    Bad Case:

    REG 0x00: 0x3100
    REG 0x01: 0x7849
    REG 0x02: 0x2000
    REG 0x03: 0xa140
    REG 0x04: 0x01e1
    REG 0x05: 0000
    REG 0x06: 0x0007
    REG 0x07: 0x2001
    REG 0x08: 0000
    REG 0x09: 0000
    REG 0x0a: 0x0100
    REG 0x0b: 0000
    REG 0x0c: 0000
    REG 0x0d: 0x401f
    REG 0x0e: 0x1000
    REG 0x0f: 0000
    REG 0x10: 0x1812
    REG 0x11: 0x0108
    REG 0x12: 0xe600
    REG 0x13: 0x2800
    REG 0x14: 0000
    REG 0x15: 0000
    REG 0x16: 0x0100
    REG 0x17: 0x0065
    REG 0x18: 0x0480
    REG 0x19: 0x8000
    REG 0x1a: 0x0010
    REG 0x1b: 0x007d
    REG 0x1c: 0x05ee
    REG 0x1d: 0000
    REG 0x1e: 0x0102
    REG 0x1f: 0000

    I have measured the clock; it seems to maintain 50 MHz. 

    [ 66.573865] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 73.849079] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
    [ 75.933756] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 78.009091] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
    [ 80.088749] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 83.209090] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
    [ 95.688799] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 96.729095] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
    [ 104.015925] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 106.089105] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx

    [ 117.528748] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 126.889076] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
    [ 134.168740] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 137.289095] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
    [ 141.458741] davinci_emac davinci_emac.1 eth0: Link is Down
    [ 145.609268] davinci_emac davinci_emac.1 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
    [ 150.813809] davinci_emac davinci_emac.1 eth0: Link is Down

    Sincerely, 

    Wei Han

  • Hello Wei Han,

    Unfortunately, the dumps don't point out anything specific. Most deltas shown in the dump can be attributed to link status, which was already known from the top level summary. Can you please provide more information about the link partner, and if certain link partners trigger this behavior?

    How fast are you hot plugging? During bad state, is there still presence of FLP when measuring on RJ-45 with 100 Ohm termination (either provided by link partner or elsewhere such as scope/load board)?

    Sincerely,
    Gerome

  • Hello Gerome, 

    I didn't wait for the link-up to complete; I already performed another link-down and link-up d, so the speed is quite fast. 
    Based on my log, I'm using the polling method to read the state. After the issue occurred, it kept showing "No link detected." 

    Good Case: REG 0x01: 0x786d

    Bad Case: REG 0x01: 0x7849

    And also even if I perform the BMCR=0x8000, the issue will come back very fast again, and sometimes I'm not even able to recover. Do you have any clue about it?

    when the dp83825 is working fine, measure the TD_P pin on ethernet cable, able to observe data pulse every 60Hz interval (match with the Fast Link Pulse (FLP) burst timing (~16ms).

    but when the hang issue happen, and we measure the same pin, could not observe the pulse every 60Hz anymore.

    Sincerely,

    Wei Han

  • Hi Wei,

    The working hypothesis is that the hotplugging numerous times is too aggressive and could be causing the state machine to lock up. While some hotplugging is acceptable, multiple attempts within an extremely small period of time could be the contribution to this. 

    Some customers do have hot plug test cases, but it seems what you are describing is a very aggressive use case. If you slow down the hot plug occurrence rate or reduce the number of tests per run, is the issue still present?

    Sincerely,

    Gerome

  • Hi Gerome,

    Thanks for your reply. But the main concern is we cannot control the end-user environment, but that the PHY enters a 'Latched' state where it stops responding to link pulses entirely until BMCR_RESET or a power cycle.

    Could you explain the internal logic the DP83825 uses to transition the Link Status bit? Specifically, how does the state machine handle a 'Link Down' event that occurs while the 'Auto-Negotiation' or 'Descrambler Lock' is still in progress?"

    Sincerely, 

    Wei Han

  • Hello,

    Unfortunately, this goes into proprietary implementation details of the PHY which I cannot disclose.

    Another working thought is that similar hot plugging behavior can also induce an ESD event for the PHY. This would also lead to the PHY latching up. A way to confirm this would be if the current consumption on the rails is higher than normal idle operation. Is it possible to ammeter for this?

    If this is the case, we have other PHYs in our portfolio which are more robust in ESD testing.

    Sincerely,

    Gerome