This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DP83867IS: Utilizing the loopback function

Part Number: DP83867IS

Tool/software:

Hi Team,

Communication problem occurs with a specific link partner.

In link-up process of once in 200, Ping does not work.

Based on the register information, we believe that the cause is a Master/Slave mismatch under the auto-negotiation process.

Is loopback function effective in identifying the cause of the Master/Slave mismatch?

I have asked several questions about this issue, but it has not yet been resolved.

We must resolve this issue early.

DP83867IS: Communication problems
https://e2e.ti.com/support/interface-group/interface/f/interface-forum/1447700/dp83867is-communication-problems

DP83867IS: About IDLE ERROR COUNTER in register 0x000A bit[7:0].
https://e2e.ti.com/support/interface-group/interface/f/interface-forum/1448230/dp83867is-about-idle-error-counter-in-register-0x000a-bit-7-0

Best Regards,

  • Hi Atsushi-san, 

    Is loopback function effective in identifying the cause of the Master/Slave mismatch?

    Loopback functionality is useful to validate the data path, specifically the xMII and MDI lines, or to narrow down some particular behaviour to a part of the data path. Since auto-negotiation occurs before a link has even been established, loopbacks would not be an effective means for debugging. 

    I believe we were able to force the PHY to Slave mode, which helped this 1/200 times ping behaviour, but wasn't an effective workaround for customers end case. This points towards the link partner.

    Would it be possible to read the link partner registers in this 1/200 case? This would let us see the state of the link partner in addition to the PHY so we can see if there is indeed a mismatch in master/slave resolution. 

    Communication problem occurs with a specific link partner.

    Have you tried pairing the PHY with a different link partner? Does that verify that this behaviour only occurs with this specific link partner?

    Best,

    Vivaan

  • Hi Vivaan-san,

    Sorry for the late reply.

    I have received link partner's register information.

    I will send it to you via private message.

  • Hi Atsushi-san, 

    Thank you for this information. I noticed that there are 2 columns for each device with different values and wanted to clarify what each columns meant. 

    Best,

    Vivaan

  • Hi Vivaan-san,

    Thank you for your reply.

    I apologize for the lack of explanation.

    The left column shows the register values ​​during communication failure, and the right column shows the register values ​​during normal communication.

  • Hi Atsushi-san, 

    Thank you for clarifying. Based on this register dump, unfortunately, I do not see any new information. It looks like the only difference in this case is the 867 being configured to slave and not master. 

    The only other factor that comes to mind in this scenario would be the clock signal. Since the behaviour is only observed when the 867 is master, I wanted to verify that the input clock crystal meets the requirements laid out in the datasheet. This clock acts as the internal reference clock which is then embedded into the data which is sent to the link partner. The link partner then recovers this clock from the data. 

    Once we have verified that these requirements are met, we should test the CLK_OUT pin which output this internal reference clock. If it is not already enabled through straps, we can use bit 6 in register 0x170 to activate CLK_OUT. We must make sure that this clock is within +/- 50ppm as well. Through this, we can verify if the PHYs internal clock is the cause of this behaviour. If the clock does seem to be within these limits, I would suggest checking the link partner's clock since maybe the link partner is not able to properly recover this clock from the embedded data, which may be causing this behaviour. 

    Best,

    Vivaan

  • Hi Vivaan-san,

    Thank you for your reply.

    Question 1:

    Could you tell me about Master/Slave and embedded clocks?
    It's possible that I don't understand it correctly.

    I have understood that regardless of whether it is Master or Slave, the transmitter(Tx) embeds the clock and the receiver(Rx) recover it.
    (In other words, both the Master and Slave of the link embed and recover the clock.)
    Is my understanding wrong?

    Or does the Master always embed the clock when both of transmitting and receiving, and the Slave just recovers it when both of transmitting and receiving?

    Question 2:

    Does the clock we should check by oscilloscope mean XI?

    The customer system passes compliance testing. Is it still possible that the clock (XI) is the cause?
    I'm not familiar with compliance testing, but is it something that is tested in both Master/Slave roles?

    Question 3:

    How should we check if the link partner is able to properly recover the embedded clock?

  • Hi Atsushi-san, 


    It's possible that I don't understand it correctly.

    I have understood that regardless of whether it is Master or Slave, the transmitter(Tx) embeds the clock and the receiver(Rx) recover it.

    According to the IEEE 802.3ab Standard, the master is the one whose local clock is used for syncing data. The master uses this local clock and embeds it into the data stream. This clock is then recovered by the slave, and this recovered clock is then also used for anything the slave transmits back to the master. You are correct, that the slave also embeds the clock into the data, but it is actually the same clock it recovers from the data sent by the master, so ultimately, the timing is dictated by the master clock, albeit indirectly. 

    For more information on how this works, you could check out the IEEE standard 802.3 clause 40.

    Question 2:

    Does the clock we should check by oscilloscope mean XI?

    The customer system passes compliance testing. Is it still possible that the clock (XI) is the cause?
    I'm not familiar with compliance testing, but is it something that is tested in both Master/Slave roles?

    That would depend on the configuration the customer is using. If the customer is using a oscillator input, XI can be checked to make sure that the below listed requirements are met. If the customer is using a crystal, then the crystal specification should specify whether it meets the requirements mentioned in my last reply. In case of the crystal, I would also like to check the CLK_OUT signal since it is more robust to measure. 

    Compliance testing does not include clock ppm measurements, so it is still possible that it may be the cause of this behaviour. Compliance testing is not affected by master/slave resolution so I don't think it is tested for both modes. 

    How should we check if the link partner is able to properly recover the embedded clock?

    One way that comes to mind would be checking for FIFO behaviour. Generally, a clock mismatch results in FIFO errors on the receiving side, which in this case would be the STB. Since the STB is not a TI part, I am not sure exactly how to find this FIFO information.

    We could also try doing an ABA swap and test 867 in master mode with a different PHY than the STB to narrow down this behaviour. 

    Best,

    Vivaan

  • Hi Vivaan-san,

    Thank you for your detailed explanation.

    First, let's check the input clock requirements and CLK_OUT of the 867.

    Additional information:

    I found that the rate at which the 867 resolves to master through auto-negotiation is about 50% (that is, slave resolution is about 50%).

    When the 867 is set master, it does not necessarily mean that communication problems will occur.

    However, when problems occur, the 867 is always the master.

    Do you have any new opinion from this?

    (For example, if it can link up normally as the master, it is unlikely that the clock is the cause.)

  • Hi Atsushi-san, 

    Thank you for this additional info. While I would still like to check the clocks, there are some additional things we can also test for. 

    First, I would like to test using the reverse loopback function of the PHY. You can enable the reverse loopback function in register 0x16, and send packets from the link partner side and observe for any idle errors. This would tell us if the idle errors are caused by the MDI/Link Partner side, or the MII side of the 867.

    If this test passes and shows that the behaviour is not due to the MDI side, we can check the avg clock frequencies of the 867 and the SoC/MAC interface used. This could also be a reason for the idle errors. 

    As mentioned in my last reply as well, doing an ABA swap would help us isolate this behaviour to one half or the other, after which we can narrow it down further. 

    Best,

    Vivaan

  • Hi Vivaan-san,

    The customer did link-up test with another link partner without forcing master/slave many times.

    The probability that the 867 will be set the master is 50%, so this is absorbed by increasing the number of tests.

    No problems occurred.

    So we would like to try a loopback test.

    You suggested me reverse loopback.

    Is my understanding of attached "Reverse loopback.pdf" correct?

    According to the datasheet, the supported loopback functions differ depending on the MAC I/F and data rate.

    The PHY is set at 1000Mbps, SGMII.

    Does the 867 support reverse loopback in this config?

    I could not find this from the table below.

    When communication problem occurs, the link is up but there is no ping response.

    Can Loopback still be performed in this case?

    Reverse loopback.pdf

  • Hi Atsushi-san,

    Thank you for confirming that the packet loss behaviour is not observed while testing with other link partners.

    Based on this result, I don't think a reverse loopback is required. we can already narrow down this behaviour to the link partner based on the ABA swap test. 

    As we were discussing in the other thread, I believe this behaviour can be fixed by writing 0x050[1]=0 to disable the scramble mode that the link partner is accidently enabling. 

    Best,

    Vivaan