This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TLK105L: Communication control board PHY disconnection problem

Part Number: TLK105L
Other Parts Discussed in Thread: TLK105, AM5728

Tool/software:

Special topic on trial issues

1.1 Background

In the test, it is found that the communication control board and serial port expansion board (the same as PCB) will have the problem of no response to the 100 Gigabit network eth3 fault during the temperature change. The principle is as follows: the CPU accesses PHY1/2 through an SMI bus, and the addresses of PHY are 0X01 and 0X03.

The internal points of PHY are as follows, reusing the schematic of TLK105. But the 83822 has been adapted and the relevant devices on the NC pin of the 83822 have been deleted.

After removing the peripheral device, the three pins are connected by one line, as shown in the following figure:

1.2 Test problem phenomenon

According to the requirements of HASS test, the TSCP host subrack needs to be placed in the HALT test chamber for temperature cycling from -25°C to 60°C, supplemented by vibration. At -20°C, the eth3 network of individual communication control boards is interrupted, and communication cannot be achieved, and ping will also be interrupted.

Read the register status of eth3 and find that none of the PHY registers can be read. When the underlying layer of the operating system polls for PHY addresses from 0 to 31, it is found that the PHY address is gone, and any registers for that PHY are inaccessible.

According to the analysis of the problem that TLK105 was replaced with 83822 chip after PHY could not be read at room temperature last year, the phenomenon is consistent with that time.

1.3 Modification adjustment and test verification for problem

1.3.1 MDC waveform

The known scheme can solve the problem of signal interference by modifying MDC and MDIO wiring, but it does not fundamentally solve the source of interference. Therefore, according to the above circuit, the signal test is carried out by adjusting the resistance, the capacitance and inductance of the power supply input and output terminals, and replacing the power supply module. The test record is as follows:

Poor waveform:

Better waveform:

On the basis of this waveform debugging, we have modified a version of PCB, and the modified version has a better test waveform, as shown in the following figure:

The signal waveform test for SMI bus is as follows:

MDC:

MDIO:

The overall waveform is okay, compared with the multi-capacitor scheme, the MDC overshoot of this waveform is slightly larger by about 1%, but the signal quality is still improved by modifying the wire routing.

And after 500 rounds of testing, there was no problem of PHY disappearing, and the hardware was finalized with this version.

However, similar signal quality problems occurred after our PCB was replaced in common production, and capacitors were added to the MDC/MDIO. But, because the PCB cannot be modified, the capacitor is pulled up to VCCIO, and the test does improve the PHY drop problem of this version of the PCB. The produced boards are rectified according to this plan.

The updated version of the PCB was redrawn according to the pull-down resistor method, and the subsequent board production was carried out.

However, recently, a batch of boards with the pull-up capacitor scheme on the pull-up MDC/MDIO signal line have the problem of PHY disconnection in the aging experiment. The performance of disconnection is the same as before.

At present, the main problem is that it is impossible to confirm what state the PHY is in after it is disconnected and the failure process of PHY cannot be obtained through register access. The network can only be restored to service by the PHY's hard reset signal. What may be the cause of this situation and how to further locate the cause of the failure?

 

  • Hi,

    Currently TLK105 part is under NRND. We don't have much resources for this particular part for debug. May I ask is this for new design?

    If so, we have highly recommend using DP83822 instead of TLK105 since it has better EMC performance and has a more robust feature. We also have rollover document from TLK105 to DP83822 and we are gladly to help on DP83822:

    Again, we have limited knowledge on TLK105 part. We only have limited resource for debugging.

    --

    Sincerely,

    Hillman Lin

  • Now I'm using DP8382, but the problem is still the same

  • Hi,

    Thank you for the explanation. It seems like your MDIO/MDC is not access properly based on your register log.

    If possible, could you try to remove the 2.2k ohms pull up on the MDC line and see if that resolve your register access issue?

    --

    Regards,

    Hillman Lin

  • This resistor is not soldered.

  • Hi,

    May I ask which resistor is not solder in the schematic you provided?

    --

    Regards,

    Hillman Lin

  • 2.2k ohms pull up on the MDC line,R3U

  • Hi,

    If possible, could you let me know which PHY ID did you write for MDIO access?

    --

    Regards,

    Hillman Lin

  • After the problem with Eth2 disappears from the bus, the register value of eth3 will change (in the case of low temperatures under 0℃, but when the temperature rises, it will no longer occur)
    logging: eth3's 0x0a register changes (normal default is 0x0104)

    It is necessary to confirm why the register value of ETH3 will change at this temperature range ?

  • Hi,

    Are you saying in low temperature, your SOC cannot recognize PHY ID 2?

    Is there any communication when you are operate in low temperature? Could you also check register 0x0014 and 0x0015 and see if you are seeing any IDLE Error or RX_ER?

    If you reset the PHY during low temperature case, does it resolve the issue?

    --

    Regards,

    Hillman Lin

  • Modify two PHY addresses, 1 to 2, 3 to 5, -25 degrees on the board, and the problem is repeated. In this case, the PHY registers that read eth2 are 0xfffb

    Registers that read addresses from 0 to 31 are only available for addresses 5 (eth3), the rest are 0xfffb.The 0x0014 and 0x0015 register values for eth3 at the time of the read failure are shown in the figure below.

     

    Read the other basic state registers of eth3 as shown in the figure below

    When the temperature of the faulty card returns to normal temperature from low temperature to about -7 degrees Celsius, the board is powered back on, and the ETH2 initialization is abnormal. Reading the register value of eth2 can read the value, read the 0x01 register, show that the auto-negotiation is not completed, the register value is 0, and the link status value is 0.

     

    Then, after entering the commands "ifconfig eth2 down" and "ifconfig eth2 up", the auto-negotiation is successful and the PHY initialization is successful.

    Q1: What is the current status of eth3?

    Q2: What could be the reason why eth2 is not initialized successfully?

     

  • Hi

    I would like to double check on three scenario:

    • eth1 is working fine
    • eth2 is not able to link up
    • eth3 is not able to link up and access register

    If possible, may I ask what is the different in schematic wise between eth1, eth2, and eth3? If possible, could you share the schematic of all three PHYs?

    Is there any strap difference between eth1, eth2, and eth3?

    --

    Regards,

    Hillman lin

  • The CPU extensions eth0 and eth1 are Gigabit Ethernet and use RGMII interface, while eth2 and eth3 are 100Mbps Ethernet and use MII interface. The CPU uses AM5728, eth0 and eth1 use KSZ9131, and no problems have occurred in operation and various environmental tests, and the CPU's access to PHY is normal. While eth2 and eth3 use 83822, the current problem is one of these two PHYs, and the current high probability of disconnection is eth2. As shown in the following picture:

     

     

    Schematic diagram of gigabit network:

     

     

    Two gigabit network PHY share one SMI bus, using the SMI bus function of the D3 and F6 pins of AM5728, as shown in the following figure:

     

     

    Two 100Mbps network PHY share one SMI bus, using the SMI bus function of the AB3 and AA4 pins of AM5728, as shown in the following figure:

     

  • Hi,

    I am kind of confused now. It seems like you are using 1Gbps PHY.

    • Did you have the schematic for DP83822? 
    • If you swap the two DP83822 PHY, does the issue follow the PHY?

    --

    Regards,

    Hillman Lin

  • Did you have the schematic for DP83822?

    Please refer to the schematic diagram of the TLK105, which is package-compatible。

    If you swap the two DP83822 PHY, does the issue follow the PHY?

    The fault does not follow the PHY, at present, it follows the position, and the high probability is that the signal quality leads to eth2 easy to cause problems.

  • Hi,

    This figure is kind of hard for me to read on 100Mbps schematic. If possible, could you share a better schematic picture?

    Thank you for confirming the ABA swap. I will focus on schematic around eth2.

    --

    Regards,

    Hillman Lin

  • Hi,

    I will review it and provide you an result later this week.

    --

    Regards,

    Hillman Lin

  • Hi,

    Does removing the pull up resistor on MDC lines help? We don't want any pull up resistor on the MDC clock. This would potentially mess up the MDIO communication if the pull up resistor is too strong.

    --

    Regards,

    Hillman Lin

  • There is no pull-up resistor on the MDC signal, the MDC is currently suspended, or add 10pf/100pf capacitors to the MDC, the probability of PHY dropout problems will be reduced with the capacitors added to the MDC.

  • Hi,

    Did you have Selea? If possible, could you probe the MDIO and MDC line under Selea and see if you are able to decode it?

    Based on the observation, it seems like the PHY ID is not correctly recognize. This will result in processor not able to talk to PHY correctly. If possible, could you double check on the PHY ID and make sure the PHY ID is recognize correctly?

    --

    Regards,

    Hillman Lin

  • We captured the SMI bus waveforms with an oscilloscope, but when we put the oscilloscope probe to test the MDC/MDIO signals, the probe increased the capacitance on the line, the PHY did not appear abnormal, and the PHY loss problem did not reoccur. We also analyzed the SMI bus using a DSlogic logic analyzer, and under normal conditions, the timing was normal. However, due to the capacitance introduced by the probe, the PHY drop problem could not be reproduced and it was not possible to confirm what exactly was wrong with the PHY. The original ID of this PHY was 0x01, but after the dropout problem, the PHY with this ID could not be recognized and this PHY disappeared from the SMI bus.

  • Hi,

    Regarding to the last statement, "The original ID of this PHY was 0x01, but after the dropout problem, the PHY with this ID could not be recognized and this PHY disappeared from the SMI bus."

    • May I ask what is the "dropout problem"?

    PHY is able to recognize the MDIO access before with PHY ID 1, but something happen and result in PHY is no longer able to recognize PHY ID 1. Is my understanding correct?

    --

    Regards,

    Hillman Lin

  • Hi,

    Our expectation is somehow the PHY is change the PHY ID after the "dropout problem". It clearly shows that there are some concern on the board that result in PHY no longer able to recognize the PHYID 1 anymore. Since ABA swap the issue follow's the board.

    • Our main hypothesis is the strap resistor on the board is somehow changes or is at the boundary of the threshold. If possible, please do check the strap resistor around the PHYID or check different PHY ID. The PHY ID might be duplicate with some other PHY which result in not able to recognize the PHY ID anymore. 
    • You can also only power up that specific PHY without powering up other PHY and see if you are able to read this PHY's in different PHY ID.

    --

    Regards,

    Hillman Lin