This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DP83822H: sporadically no ETH link after power cycle

Part Number: DP83822H
Other Parts Discussed in Thread: AM4376,

  • Using the DP83822H PHY in design together with AM4376 as MAC
  • Issue seems to be temperature dependent (occurrence increase with temperature). Testing at Ta = +70°C, Tc at DP83822H is <90°C
  • First assumption of sporadically unintended strapping information (because of temperature, input leakage etc.) at power up could not be confirmed yet (readout of strapping registers after power up was consistent and not different in case of missing link.
  • Scope plot below showing boot timing at Ta = 25°C. MII2_COL was used to monitor, because strapping level affects PHY Adress Bit 0 and PHY mode (mode 2 and 3 = FX_EN, fiber mode)
  • At power up, MII2_COL pulled up internal. About 200ms later, the /POWER_RES is released, starting the MAC to boot (1st stage, MII2_COL at <2V and 2nd stage, MII2_COL with internal PU at <3,3V)
  • Reason for MII2_COL beeing ~200mV below VDD should be voltage drop from input leakage and internal PU, but level well within intended strapping mode 4 (2,29V..3,3V). There is no external PU/PD resistor at MII2_COL.
  • At the end of the 2nd boot stage, the PHY reset is released (light blue) and the PHY is changing into operating mode, driving the MII2_COL low.
  • Not shown (but verified), the first MDIO access starts not before <200ms after PHY reset release (MDC starting 200,1ms after reset)
  • From the timing diagrams in the datasheet it is not clear, which diagram applies to the presented startup phase and at what point in time the strapping pins are latched

  • The datasheet (section 8.5.1) is showing: "the values of these pins are sampled at powr up or harware reset" and there are two diagrams shown (Figure 1, Power-Up timing and Figure 2, Reset Timing)
  • In both diagrams, the PHY clock is already active before the power supply ramp up ???.
  • In the Power-Up diagram (Figure 1), the reset is released at the same time of the supply ramp up, which is different to the bootsequence presented above. Which timing diagram would apply here ???
  • T2 = max. 200ms after reset release, the post power up stabilization time after reset release before MDC is kept (not shown here, but verified)
  • T3 = typ. 200ms hardware configuration latch-in time for power up after reset release can't be valid here, because the PHY is actively driving the MII2_COL low already at this time. This means the latch in must have been before, but where ???
  • T4 = typ. 64ns after the latch-in, the output drivers are getting active, driving the MII2_COL low. That also shows that the latch in can't be 200ms after reset release in the presented sequence

    • In case Figure 2 would apply (Reset Timing), there is a second reset low pulse of T1 = min 10us shown, which would be missing in the presented boot sequence.
    • Assuming Figure 2 would apply, the latch in would take place T3 = typ. 120ns after the reset release and the outputs would be active T4 = typ. 64ns after the latch in. In more detailed measurements of the reset release, the MII2_COL is driven low ~650ns after the reset release, so from this it looks more like Figure 2 would apply, but would it be required to have a further reset low pulse of >10us?
    • Would you see a principle behaviour in the presented boot sequence that could explain the sprorradical link losses after power cycle we see in the test?
    • Which Figue would apply in the presented case and at what point of time the latch-in should take place in the presented boot sequence?
    • Would there even be multiple latch-in (after Power-Up and after PHY reset release?
    • Would there be an additional reset low pulse of >10us required before the reset is released?
    • The MII2_COL pin is pulled up from 0V at the beginning of the power cycle, which indicates inrirectly, that the PHY was completelly unpowered before.

  • Hi Ralph,

    What does the bootstrap register read in register 0x467 and 0x468 when the issue occurs?

    These registers need to be accessed through extended register access, instructions are found in the SMI section.

    Please also read for me register 0x0 - 0x1F.

  • Hi Ross,

    yes, we have been aware of the indirect addressing of the registers 0x467/8...

    Please find a spreadsheet attached with the analysis of the read out from several PHY registers. We have two PHYs (MII1 and MII2) in the system. The issue happened on several target modules on MII2 (exclusive), so we first assumed it would be dedicated only to MII2, but then the link issue also happened on another DUT on MII1 (reproducible). The test are done at Ta = 70°C.

    In the spreadsheet, you find the read out of the registers for MII1 and MII2 right before the test stopped because of missing link on MII1. Parsing the logfile with all the register dumps before the link loss doesn't show any difference. This is why i am asking the main question, where in time the latch-in of the pin strapping will happen in the shown boot sequence?

    A good explanation for the effect would be a sporadically unintended pin strapping (wrong PHY address, wrong PHY mode (FX_EN) because of undefined strapping levels (PU/PD and input leakage, etc.), but as i wrore earlier, we could not prove this in the PHY register read outs in case of missing link up to now.

    Currently we continue testing with the modules we saw the effect on MII2 and probably we see something here. For the register dump, we added a wait time after the PHY reset release to make sure the latch-in of the strapping pins is over and the link will be established (in the good case).

    In the register dumps before, the link status register bit was always 0, also in the good cases. So we assume the dump always before a link was established.

    Today is my last day before summer vacation. My colleague in the mail list of this ticket will follow up on this issue. It is very important for us to find the route cause soon, because this is the only open issue blocking our current product launch.

    Best regards

    Ralph

    PHY Register Dump Develop CPU ETH2 link loss second test.xlsx

  • Hi Ross.

    Ralph is now on his vacation and I take over that issue. We retested this issue with a build in delay prior to the register dump to be sure not to read registers before link established.

    This works fine so far. We see the link status in register 0x1 for example.

    I analyzed the registers, but a root cause cannot be detected. Strapping is as expected. One thing I discovered is that Auto MDIX Bit changed (Register 0x10 Bit 14). But this has changed multible times, also on "good" startups. I think, because both link partners using AutoMDIX, sometimes the result is different (but identical on both partners), so this should be OK.

    The register dump shows that auto negotiation process is not finished. But I cannot see why.

    Best regards,

    ChristianDP83822 registers.xlsx

  • Dear all,

    I have done some measuremnts on my desk. Some at room temperature, on other tests I heated up the device with a heat gun.

    I hab no issue with the link, and I didn´t recognize and temperature dependent change in behaviour.

    Attached some measurements:

    - Supply ramp-up, Reset, RXCLK output

    - Supply interruption

    - MDI lines (Auto MDIX enabled, so RX / TX may be swaped)

    Measurements at room temperature:

    * The RXCLK pulse is not captured properly. It looks like a low frequent pulse, but it is 25MHz, see following pictures

    Measurements with heated-up device:






  • Hi Christian,

    You no longer see issues?

  • Hi Ross,

    i am commenting, because i started this thread before my vacation...

    We were talking to TI directly in parallel to this thread. In a more detailed phy register analyisis in case of the link problem, it was obvious that there is missing information about the link partner capability because the auto MDI/MDIX was failing.

    The proposal from TI was to enable the "robust" auto MDIX feature, to give the auto MDI/MDIX more time to prevent deadlock in case the link partner is taking too much time before switching. With this flag enabled, we did not see the link problem during our tests anymore.

    There are some open questions, since we discovered a temperature dependency. The TI product group was offering a detailed description for our case we are waiting for.

    But enabling the "robust" auto-MDIX feature seemed to fix our problem.

    Best regards and thank you

    Ralph Schelling