DP83867CS: The link status read over the MDIO is intermittent

Part Number: DP83867CS
Other Parts Discussed in Thread: AM625

Tool/software:

We are finding that the link status of the DP83867CS is not being read reliably over the MDIO.  Our new board, Beagle Violet uses an AM625 host processor running Linux.  By default, the MDIO is using bit-banging (not hardware).  This choice may have been due to an early implementation of the AM625 on a BeaglePlay with a different (non-TI) PHY.  Is there any know reason to use the bit-bang MDIO with this TI PHY?  Any experience with unreliable link status?

We sort of avoid this problem by reading the PHY status twice when the link appears to be going from UP to DOWN.  This improved our reliability greatly (but its a change to a Linux driver that we would like to avoid).  We also noticed false DOWN to UP readings on a second ethernet port which was left unconnected.

  • Hi Victor,

    Is there any know reason to use the bit-bang MDIO with this TI PHY?

    DP83867 does not require a bit-bang MDIO, however bit bang should be ok if proper MDC/MDIO timings are met. 

    We sort of avoid this problem by reading the PHY status twice when the link appears to be going from UP to DOWN.

    To confirm my understanding, you are seeing the LINK STATUS bit (register 0x0001[2]) set low when it should be high? Please note this bit is Latch-Low, meaning LINK STATUS will remain '0' even if the link is brought up. It requires a read to un-latch and a second read to see the actual status. I believe this is why reading twice fixes your issue.

    There is a LINK_STATUS bit (register 0x0011[10]) that is not latch-low and directly reflects the link status. If you read this bit, do you still notice the problem?

    Best,

    Shane

  • The PHY polling routine in the Linux kernel is reading the Basic Mode Status Register (BMSR register 0x01).  What happens is that while the Ethernet link is up, we get occasional Link down indications.  I understand that it takes a read to release the latch when a low is read.  We added code to read the link status a second time to confirm the link down and this improved it but not fully.

    Just to be clear, we were using the bit-bang version of the MDIO driver as that is the way the Linux kernel/drivers were configured for an AM625 platform.

    As a test, we then forced the use of the hardware MDIO and lo and behold, we do not have any erroneous status reads.  This would solve our problem BUT there is an errata saying we should use bit-bang MDIO.  In fact it says if we use the hardware MDIO we will occasionally read a Link down status (which is exactly what we see using the proposed solution using bit-banging).

    So we don't know which way is better.  Perhaps you can tell us if Errata i2329 still applies to the current production AM625 processors with DP83867cs phy.

    https://software-dl.ti.com/processor-sdk-linux/esd/AM62X/08_04_01_03/exports/docs/linux/Foundational_Components/Kernel/Kernel_Drivers/Network/CPSW3g.html#errata-i2329-mdio-interface-corruption-cpsw-and-pruss

  • Hi Victor,

    From the PHY perspective there is no issue using standard hardware MDIO, this seems more relevant to the AM62x processor.

    Can you submit a separate E2E ticket with the processor part number? This will direct the thread to the team that supports AM625. I support our Ethernet PHYs but do not have context on why the processor requires a bit-bang MDIO.

    Best,

    Shane

  • Well we found interesting evidence related to this issue.  We monitor the MDIO_MDC signal while the PHY is sending/receiving data.  The signals are usually good but if there is a concurrent PHY transfer, the MDC signal goes to full, half and quarter voltage levels.  Its as if the VDDIO pull-up voltage sags or maybe there is a clamp limiting the maximum voltage during those times.  Not sure if its the PHY or the AM6254 processor however.  And it appears to be independent of MDIO hardware vs bit-banging.

  • Interesting find. Are you certain this is the MDC signal? It looks more characteristic of the MDIO line. MDC would look like a clock signal when data is transmitted.

    I haven't seen that behavior on our PHY EVMs. From the PHY perspective, MDIO should be open drain (no pullup to VDDIO in the PHY) so the PHY shouldn't be sagging this. Typically you would have an external pullup on the MDIO line.

    Best,

    Shane

  • Hi Victor,

    I hope you're doing well. As there are no open actions to take I'll close this thread for now due to inactivity.

    Feel free to reply here with any further questions on DP83867 and I will continue to respond.

    Best,

    Shane

  • Hi Shane,

    Sorry I was OOO for the holiday.  This issue is still open for us.  You were correct that the signal is the MDIO (and not the MDC).  That was my haste in trying to post before an unrelated conference call.

    As new information, we have two DP83867CS in our system (on on each RGMII) and on the same MDIO.  If we hold the second PHY in RESET, we do no have this error.  This seems to imply that the second PHY somehow interferes with proper status reads.  We can use the RESET condition for our current customer but not all customers would be satisfied.

    Thanks,

    Victor

  • Hi Victor,

    Thank you for the update. Can you share a schematic of the MDC/MDIO lines? I'd like to see the full path from your controller to both PHYs if possible.

    Are the PHYs using different MDIO addresses?

    Best,

    Shane

  • The full path runs over three boards.  The main board Ethernet schematic is below.

    The second PHY schematic is below:

    I should note that we caught and corrected the pin-out error for LED_0 and LED_2.  The nets were swapped on the second PHY in order to configure that board to use Mirror mode.

    Yes the PHY's are at two different addresses.

    Just to recap, we read incorrect status from the working PHY (incorrect down) and the disconnected PHY (incorrect up) when both PHY's are non-reset and there is a packet transfer occurring during the MDIO operation.  That is when the incorrect voltage levels were seen.  If we hold the disconnected PHY in RESET, we get proper link status reads.

    Thanks,

    Victor

  • Hi Victor,

    I see what looks like two sets of 2.2K pullups on both MDC and MDIO. One near the processor and one near the U3 PHY. MDIO should have one 2.2K pullup and MDC does not need one (from the PHY perspective). Does removing one of the MDIO pullup resistors fix the incorrect voltage levels?

    Additionally I do not see where the PHY-side center taps are terminated in your Magjack. DP83867 should have 0.1uF capacitors to GND on each center tap, and the center taps should not be shorted together:

                     

    Also for my understanding, is the link status the only bit that reads an unexpected value? If this is an MDC/MDIO bus issue I would expect all registers to have problems reading.

    Best,

    Shane

  • What you did not see is an I2C bus isolator used on the board with the Ethernet PHY.  From what I understand we need the pull-up on both sides of the I2C level shifter/isolator.

  • Hi Victor,
    Thank you for clarifying. For my understanding, is the link status the only bit being read incorrectly?

    Want to be sure we’re isolating the problem to the MDIO interface correctly, or if this is perhaps an MDI implementation issue.

    Best,

    Shane

  • This is something I can not answer without a fair amount of programming effort.  What we do see from the captured trace is a series of bits that are probably in error.  I suspect the bits near the low level are being driven different ways by each PHY.  I know MDIO is supposed to be open drain but this waveform can not be explained as open drain.  Its being driven push-pull.

    Keep in mind this only happens when the second PHY is not in use (but not held in RESET) and there is RGMII activity to the first PHY.

  • Hi Victor,

    I'm trying to replicate this in our lab, however I do not see the issue. I do see the voltage drop somewhat during transmission, but not as dramatically as your image. The MDIO bus works in my setup for both link up and link down scenarios. Here is a waveform from the MDIO bus in my setup:

    I have two DP83867 EVMs (one R variant and one S variant) connected to an MSP430 launchpad acting as the MDIO controller. They both share one MDIO/MDC bus. I'm curious if your MDIO controller could be the issue here and whether using a launchpad as the controller would fix the issue. If you're able to access the MDIO/MDC lines on your board with an external MDIO controller, we have public USB2MDIO software that can be used to emulate an MDIO controller on an MSP430launchpad.

    Keep in mind this only happens when the second PHY is not in use (but not held in RESET) and there is RGMII activity to the first PHY.

    Since the issue seems to happen in one specific scenario I do not believe this is necessarily a hardware implementation issue. If there was a hardware implementation issue I would expect the failure to occur regardless of the reset status or RGMII activity of a PHY. I have a few more questions to help us narrow this down:

    • Is the behavior only tied to one PHY? I understand that when one PHY is disconnected and not held in reset this issue happens when the other PHY transmits data. If you swap the transmitting and disconnected PHYs does the issue continue to show up?
    • Are you able to change the PHY ICs on your board? I want to rule out a defective part. If you can replace both PHYs (one at a time and testing in between) we can ensure there is not a faulty chip at play here.
    • Can you double check that your power on and reset timings are within the datasheet specs for both PHYs? When power on timings are not met, it can put the PHY in a bad state. This can lead to all kinds of issues.

    If we can rule out a faulty PHY and power on timings then check that the MDIO controller is not the source of this issue. 

    Best,

    Shane

  • Well one thing to consider is that the AM6254 has an anomaly with MDIO peripheral and they recommend using bit-banging GPIO instead.  We have tried both and see the same type of failure.

    We should note that we only have this occur if we have more than one PHY on the MDIO (or have either PHY held in reset).  That makes us think it has nothing to do with the processor anomaly.  The link status error can occur with either PHY (assuming one is being held in reset).  One PHY is connected while the other is not.  We can get false link up and link down readings.

    Unfortunately we don't have an easy way to connect another MDIO master.  I'll think about what we might be able to do.  As for the PHY's we have tried at least two boards for each PHY (they are on separate boards).  As for power sequencing, we realized we needed to improve this on one board and did make an appropriate change.  The PHY's work perfectly however for the network operations with no unexpected packet errors.  So I think this is strictly some sort of weird MDIO situation.

    Thanks for listening and offering suggestions.  We are not ready to give up on this yet.

  • Hi Victor,

    I'm trying to think of other tests we can do to help narrow this down. It sounds like the issue is not tied to either PHY. Since you're seeing this on multiple boards, it does not seem like a faulty chip issue either. 

    I remember you mentioned it would take programming effort to read other registers on the PHY. Is there any possibility of reading other registers here?

    • You wouldn't need to read all registers, but if we can do a spot check on one or two other registers in the failing case, it would give us a better idea of the scope for this issue. Basically if the Link status is the only register showing a problem, there may be something affecting the link.
    • I'm also curious how you're reading the registers on the PHY through linux. Is there software running that could be interfering with the reads when in the specific RGMII transmit mode? Is there any potential that the implementation is reading the wrong PHY?

    Are you seeing the correct signal on LED_0 when link is up/down? If LED_0 is correct but the register is not then I agree this is likely an MDIO issue.

    Are you able to share the full MDIO/MDC path in a pdf schematic? I'm curious how the level shifter is implemented (and PDF is easier to follow than images as you can CTRL+F net names Slight smile).

    Unfortunately we don't have an easy way to connect another MDIO master.

    It wouldn't be pretty, but perhaps you can solder wires onto the pads of R133 and R135 to get access to MDC/MDIO. If you could HIGH-Z the AM625 pins it could allow an MSP430 to control the bus. You'd need to connect an external pullup on MDIO though. 

    Best,

    Shane