This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DP83867IS: Communication problems

Part Number: DP83867IS

Tool/software:

Hi Team,

There is communication problems in the customer system using DP83867IS.
Could you help us?

<Background>
- 1000Mbps communication is assumed with auto-negotiation. 10BASE(Half) and 100BASE(Half) are also advertised.
- Link is established according to the register and LED, however, there is no Ping response from the link partner.
- The customer's system has passed the compliance test, so we checked the registers.
- When we checked Register 0x000A, it is 0x3800 under normal conditions and 0x78FF under communication problems.
- According to this, the local device is a slave under normal conditions, but is a master under communication problems. I think that this is the cause of the communication problems.
- Register 0x0009 is 0x0200, so manual master/slave setting of the local device is disabled.

<Question>
1) What is the difference between the master/slave roles?
2) When and how are the master/slave roles determined?
3) I think that communication problems will occur if the master/slave conflicts with the link partner. Is it possible for a conflict to occur even if manual settings are disabled?

Best Regards,

  • Hi, 

    1. The difference between master and slave mode is that in master mode, the PHY uses its own generated clock for data transmission and receiving, while in slave mode, the PHY uses the recovered clock for the same. Essentially, the master dictates the timing of the link and the slave follows it.
    2. master/Slave resolution is a part of the auto-negotiation process that takes place when the PHY is connected to a link partner. More information about this can be found in section 7.4.3.2 of the datasheet
    3. For communication to work properly, one device must be in master mode while the other needs to be in slave mode. This could be the cause of the issue. 
    When we checked Register 0x000A, it is 0x3800 under normal conditions and 0x78FF under communication problems

    Just to clarify, does the register 0x0A have value 0x3800 upon bootup, then after linking up, it changes to 0x78FF, or is it that it continues to hold the value 0x3800 even after link up, up until you start pinging which makes it change to 0x78FF?

    What is the link partner being used in this case?

    Additionally, if the PHY being in master mode seems to be the issue, we could try forcing it to slave mode using register 0x09 bits [12:11] as 1 and 0 respectively. 

    Best,

    Vivaan

  • Hi Vivaan-san,
    Thank you for your reply.

    <Your question 1>
    Just to clarify, does the register 0x0A have value 0x3800 upon bootup, then after linking up, it changes to 0x78FF, or is it that it continues to hold the value 0x3800 even after link up, up until you start pinging which makes it change to 0x78FF?

    <My answer 1>
    Register 0x0A is 0x3800 after bootup. However, the link partner needs to be rebooted once a day, and communication problems occur at that time. The frequency of the communication problems is 1 in 200 reboots. Once the communication problems occur, pings do not go through at all. When they read register 0x0A at this time, the value is 0x78FF.

    <Your question 2>
    What is the link partner being used in this case?

    <My answer 2>
    The link partner is a STB(Set-Top Box) made by another manufacturer. This STB has many production volume at the market. We cannot identify part number of PHY.

    <Your advice>
    Additionally, if the PHY being in master mode seems to be the issue, we could try forcing it to slave mode using register 0x09 bits [12:11] as 1 and 0 respectively.

    <My answer 3>
    I'll suggest it to the customer.

    The purpose of this E2E post is to solve the 1/200 communication problem. Please let me know if you have any advice.
    There are also differences in other registers between normal and problematic times. We are investigating these in parallel.

  • Hi Atsushi-san,

    What other differences did you notice between the two? Could you share the register dump for both cases?

    One theory about what might be happening could be that the device is auto-negotiating to the wrong setting after the reboot of the link partner. Do we have more information about the link partner at all? Does it support auto-negotiation?

    You could also try resetting the PHY after the reboot of the link partner, which should restart auto-negotiation again and, if this is an edge case where the PHY is being incorrectly configured during auto-negotiation, which it could be given that the issue is observed 1/200 times, it would give the PHY another chance to correctly configure itself

    Best,

    Vivaan

  • Hi Vivaan-san,

    Thank you for your reply.

    I'll send you the register information by private massage.

    We are having difficulty getting information about our link partners.

    Which of the following resets are you referring to?

    - Hardware Reset

    - IEEE Software Reset

    - Global Software Reset

  • Hi Atsushi-san,

    I am referring to the hardware reset which resets all the registers and the device itself.

    I saw your message and did not see a register dump attached. Could you please provide that for the debug

    The reason why there isn't the normal and low power mode settings on the register map is because those settings are reserved. Regardless, that debug step is usually taken for cases where the link itself is going down. As per my understanding of your setup, the link is always up, but the ping operation is not working.

    Did customer try resetting the PHY after the STB reboots? 

    In this 1/200 case, are you able to ping the PHY from the STB? I noticed you said you were trying to ping the STB from the PHY.

    Another step we could take would be to probe the MDI Lines of the PHY to see if the ping operation actually went through. If so, there might be an issue with the STB being unable to receive the ping.

    Best,

    Vivaan

  • Hi Vivaan-san,

    You said;
    "Additionally, if the PHY being in master mode seems to be the issue, we could try forcing it to slave mode using register 0x09 bits [12:11] as 1 and 0 respectively."

    My customer tried it during a communications failure (link is up, but there is not ping response), but the bit was not rewritten.
    Can this bit really be rewritten during operation?
    Please let me know if there is any reason why the rewrite is not successful.

    Instead, when my customer started it with SLAVE fixed, no communication failure occurred.
    So, I think that the problem is master/slave resolution.

    Then, I have a question;
    If both the local PHY and the link partner have auto-negotiation enabled, is there any possibility for both to become MASTER (or SLAVE)? Under what conditions does that occur?

  • Hi Atsushi-san,

    I took a look at the registers provided as well. I noticed that there were several idle errors, auto-negotiation error interrupt, and polarity change error interrupt. 

    Given that in fixed slave mode, there are no communication issues, I believe you are correct, the master/slave resolution seems to be the problem here. 

    Since we do not have much information about the STB, and the issue seems to occur 1/200 times that the STB is reset, it looks like we have stumbled upon a marginal case where 1/200 times, the auto-negotiation between the devices fails. This failure can be because of a few different reasons. Because of this, the PHY defaults to master mode, while the STB also seems to be in master mode. Since both devices need a common clock to communicate properly, it caused the ping failure issue because both devices were using their own unsynchronized clocks to transmit and receive data. 

    One possible reason could be a timing mismatch between the two devices auto-negotiation settings. If either of the PHYs are using Fast AN and the other is not, it could lead to unexpected behavior such as this. For our PHY, this can be checked using register 0x1E [14:12]. 

    Best,

    Vivaan

  • Hi Vivaan-san,

    Thank you for your kindly support.

    I checked it, Fast AN is disabled on the customer system.

    And Fast AN is not supported in the link partner using Realtek PHY.

    Is Fast AN TI's unique function?

  • Hi Atsushi-san,

    Fast AN is not described by the IEEE standard and is not universal. I thought that the link partner PHY was unknown so I wanted to confirm if AN was a potential issue. According to my knowledge, Realtek does not offer fast AN.

    From these tests, it looks like in rare cases, the auto-negotiation is completing with errors. The error specifically seems to point towards master/slave resolution since forcing the PHY to slave mode is fixing the issue. We could try switching cables to see if that could be the cause of this issue, but it could also be the link partner side behavior on power cycling that could be the cause. 

    Is forcing slave mode a feasible solution for the customer? It looks like it resolves the rare 1/200 auto-negotiation failure. 

    Best,

    Vivaan

  • Hi Vivaan-san,

    Thank you for your support.

    It is not solution to being forced slave mode because the customer system has possibility to also connect the system other than this STB.
    The customer system is needed flexibility so it is preferable for the master/slave to be determined by auto-negotiation.

    We would like to solve this issue by knowing auto-negotiation process.

    1) How is master mode or slave mode determined during auto-negotiation?
    For example, these roles are resolved by certain bits in the FLP(Fast Link Pulse) field or the first one to receive the FLP will be resolved as Master mode.

    Best Regards,

  • Hi Atsushi-san,

    Master and slave configuration is a rather complicated process done during auto-negotiation. This process is outlined in detail the standard IEEE 802.3, specifically section 55.6.2 of the 2012 release.

    From the tests we have done so far, it looks like the bits received from the link partner during auto-negotiation are corrupt, which is causing the PHY to inadvertently strap to the wrong configuration of master. In order to solve this behaviour using auto-negotiation, we would need to ensure that the link partner is sending the correct values for auto-negotiation. 

    We can try to reproduce the issue with a different PHY to confirm as a sanity check, but from the tests we have conducted, I believe this behaviour might be caused by the STB link partner

    Best,

    Vivaan

  • Hi Vivaan-san.

    I really appreciate your courteous response.

    You said;
    In order to solve this behaviour using auto-negotiation, we would need to ensure that the link partner is sending the correct values for auto-negotiation.

    How can I check this specifically?
    Some method may be difficult because the information we can get from our link partners is limited.

    If master and slave configuration is a complicated process, I think that's particularly difficult.

    Best Regards,

  • Hi Atsushi-san,

    I don't believe this would be a feasible option. It would require tracking the auto-negotiation packets being exchanged and decoding them manually.

    Given that there isn't much we know about the link partner, would it be possible for you to test a different link partner with the 867? We can even try using another 867 as a link partner. Keeping the link partner as fixed master, we can try running tests and see if we observe the same behaviour as with the STB. If we don't see the same behaviour and the PHYs perform as expected, we can narrow down the cause of this behaviour to the STB link partner. 

    Best,

    Vivaan

  • Hi Vivaan-san,

    Thank you for your reply.

    We'll try to check communication with the link partner fixed as master.

    You said previous post as follows.

    Could you tell me more detail about this?

    Do you mean the common clock indicates the waveform generated by the 4D-PAM5 coding on the MDI line?

    "Since both devices need a common clock to communicate properly, it caused the ping failure issue because both devices were using their own unsynchronized clocks to transmit and receive data."

  • Hi Atsushi-san, 

    The purpose of master and slave modes is that the slave follows the masters clock for sampling data. In normal communication, the slave uses the recovered clock from the received signal to ensure proper sampling instead of its local clock. 

    Best,

    Vivaan