DP83826E: Link never getting established when cfg_rescal_en in CDCR Register is set to true after reset

Eric B

Part Number: DP83826E

Tool/software:

I randomly experiment link issue in linux and after some investigation, I discovered when the problem happen, the bit 14 in the CDCR register (cfg_rescal_en) is set after the releasing of the reset. Writing 0 to shit bit makes the link come up. Does anybody knows something about this bit ? There is no information in the datasheet or related documentation to this part and this bit is not present in other family part datasheet... For now, the workaround to fix the problem is clearing this bit after the reset in the linux driver, but I do not know if doing that has side effect...

over 1 year ago

0 Evan Mayhew over 1 year ago

TI__Mastermind 24988 points

Hi Eric,

I'm unfamiliar with the effects of clearing this bit.

Can you please try these two register writes to try and resolve link:

1) 0x1F = 0x4000 (soft restart)

2) 0x0[9] = '0' (restart auto-negotiation)

Thank you,

Evan

0 Eric B over 1 year ago in reply to Evan Mayhew

Prodigy 80 points

Hello Evan,

Sorry for the long delay. This week we did some tests:

Writing in the soft restart nor auto-negotiation did not resolve the issue. The only way to recover (without the cfg_rescal_en bit) is by either doing physical reset or reset through the BMCR.
Increasing the asserted delay after a power up or the de-asserted delay after releasing the physical reset did not revolve the issue.
We use the PWRDN pin to prevent link until ready. The problem happens (randomly) only in this sequence:
1. Power up with reset pin active and PWRDN active;
2. Release the reset pin and keep PWRDN active;
3. wait few seconds for the system ready and release PWRDN pin to allow link be establish;
4. The link never come up and cfg_rescal_en bit is always true.
Releasing the PWRDN at the same moment of reset pin resolve the issue. ***It confirm the problem disappear, but in our situation, it is important to boot the phy accessible without having link until the system is ready to communicate.

0 Evan Mayhew over 1 year ago in reply to Eric B

TI__Mastermind 24988 points

Hi Eric,

Thanks for the follow-up.

Eric B said:
it is important to boot the phy accessible without having link until the system is ready to communicate.

Can you clarify this requirement? What is the concern if PHY is linked up before system is ready to communicate?

Rather than using PWRDN pin, please try writing 0x0[11] = '1' to enable power down, and writing '0' for normal operation after the system is ready to communicate. Does the issue still occur in this case?

Thank you,

Evan

0 Eric B 11 months ago in reply to Evan Mayhew

Prodigy 80 points

Hello Evan

The requirement to isolate until the device is ready is because the same phy can be used in TCP/IP or EtherCAT. EtherCAT requires to keep link down until device is ready to communicate. This is one reason why we choose this phy, because we can use PWND pin in asynchronous power up.

Using the bit 11 instead of PWND pin can introduce small window of opportunity the link can come up after releasing the reset. Some EtherCAT master are very sensible and can generate errors when this happens.

Do you know how I could get more information about cfg_rescal_en bit ? Is it really related to cable test ? This bit is true after the reset and goes to false when everything works well, I guess this is not only read/write, but also hardware use it ?

In the meantime, we are analyzing all signals at boot to at least double check the phy complies to the electrical requirements.

Thank you

0 Gerome Cacho 11 months ago in reply to Eric B

TI__Mastermind 39775 points

Hello,

Evan is OoO today and will be back tomorrow.

Sincerely,

Gerome

0 Evan Mayhew 11 months ago in reply to Gerome Cacho

TI__Mastermind 24988 points

Hi Eric,

I understand, thank you for clarifying the EtherCAT requirements.

cfg_rescal_en is unrelated to cable test, this bit handles internal PHY resistor calibration. We do not expect to configure this bit for initialization, but there are ongoing internal discussions to confirm this.

I have two more tests to confirm:

1) Does writing 0x1F = 0x8000 resolve the link issue? If so, is this an acceptable workaround (polling link status and periodically writing 0x1F = 0x8000 until link)?

2) What is the rate of link failure per power cycle?

Thank you,

Evan

0 Eric B 11 months ago in reply to Evan Mayhew

Prodigy 80 points

Hello Evan,

Evan Mayhew said:
cfg_rescal_en is unrelated to cable test, this bit handles internal PHY resistor calibration. We do not expect to configure this bit for initialization, but there are ongoing internal discussions to confirm this.

This bit should be zero after reset, but I can see it high after reset and going low when the issue does not happen. There is something there happening even if it is unrelated. Is this bit can be related to the power down pin is active at boot ?

In the linux driver, it is already using the 0x1F = 0x8000 whit this line and it still happen

phy_write(phydev, MII_DP83822_RESET_CTRL, DP83822_HW_RESET);

The proposed workaround may remove the gold of using this phy and also this means everything that has been done in the driver is lost.

The problem may happen sometime after 2 power cycles to 15 power cycles

0 Evan Mayhew 11 months ago in reply to Eric B

TI__Mastermind 24988 points

Hi Eric,

Thanks for confirming the rate of link failure. Please allow me some time to confirm suggestions for this, there are ongoing internal discussions for this same issue.

Thank you,

Evan

0 Evan Mayhew 11 months ago in reply to Evan Mayhew

TI__Mastermind 24988 points

Hi Eric,

Sorry for the delay here.

Writing 0x1F = 0x8000 is the recommended workaround from our design team.

I understand the issue of this resetting the PHY registers after the driver initialization - are you able to link up without driver configuration?

If so, we can implement this logic:

1) Power PHY, wait for link

2) If link comes up, continue with driver initialization

3) If link fails, apply 0x1F = 0x8000. Proceed with driver init if this resolves link

Thank you,

Evan

0 Eric Lewis 11 months ago in reply to Evan Mayhew

Prodigy 95 points

Hi Evan,

Another Eric here just to mix you up!

I was working with Eric B. to debug this, particularly to debug it with an oscilloscope and an opened product.

Since it was long and difficult to debug it in the uboot and Linux drivers, I tried with a bare metal standalone application flashed into an SPI flash. I tried first only with the reset and powerdown signals being asserted and released and was not able to reproduce easily the problem shutting down the power and re-starting it. I first thought that it was related only to a power up problem.

We then decided to try to run our simple PHY link test application in loops without removing the power. It reproduced the problem (from 2 on 50 tries to 10 on 50 tries). Since it is not related to uboot/Linux but only on 2 hardware signals, I guess it answers your previous question about problems with the driver. The reset assertion of 25us (after power up) has been double to 50us and the "quiet" period of 2ms on the SMI (MDC/MDIO) has been doubled to 4ms. We finally verified the status of the cfg_rescal_en bit and found that it was set when the link was down. Writing this bit to 0 always made the link to come up.

We then bought die revision 1 DP83826 (we still have a lot of die revision 0 ones). The die revision 1 has been soldered on the board. We saw that something changed in the configuration because the FLP are now disabled (we don't have pull-ups and pull-downs on each PHY strapping pin and were using the default state of the pin with the internal pull-up/down). The SOR1 and SOR2 have been verified since the beginning and out of this small change with the PHY die revision 1, everything is consistent with our intent.

The tests were done on the die revision 1 PHY with the same results.

Many different sequences were tried (since more than 1 week). Number of loops has been increased and writing the cfg_rescal_en at 0 was the only thing fixing the problem. The PHY has been reset for more than 40K times during the weekend without any problem.

I want to put emphasis on the answers of Eric B. about your questions. Writing 0x8000 at 0x1F does not solve the problem.

We suspect that while the PHY is in powerdown, it is sometimes unable to complete it analog calibration from the external Rbias resistor. What affect the fact that sometimes it succeeds and sometimes not, we don't know. We also applied delays after releasing the powerdown before accessing the PHY though the SMI as if some similar to reset requirements would exist but it did not changed anything.

I looked at your DP83826 EVM and the pin PWDN goes to the header on the breakout board. If this is not tied low and controlled to be released long after the release of the reset, you won't be able to reproduce yourself easily.

I also looked at the schematic given for the use of the DP83826 for EtherCAT (SNLA344C). This document updates a lot of things I saw on the forums about the Cext value and more. I don't know if it exist a real board for this schematic but the powerdown pin has only a pull-up. Surely not used if such a board exists.

As uboot/Linux need a lot of time to boot as opposed to small micro-controllers running the EtherCAT slave controller stack, it is important for us to be able to keep the link down until we know if we are using the DP83826 with one of the "TCP/IP" protocols or with the EtherCAT protocol. The PHY configuration need to be changed while the link is down to avoid breaking an EtherCAT loop while the software running the stack is unavailable. This is why we choose this PHY that permitted us to do it with the powerdown pin.

I then guess that using the powerdown pin as it is intended in the PHY specification is probably not frequent and that why we are having some issues with this PHY.

We recently changed the way uboot and Linux were dealing with the reset and powerdown signals by releasing them one at a time instead of both at the same time to be able to apply the proper PHY configuration through SMI according to the mode we are using ("TCP/IP" or EtherCAT) as it has been intended at the design of the product. It revealed the problem.
We never had link issues so far when both signals were released at the same time.

I hope it clarify what is exactly the problem we are seeing. I can understand that it is difficult to get answers about a particular mode not widely used and for which you need to get very deep details from an R&D team on your side.

If you need more details or have further questions, let me know.

Best regards,

Eric Lewis

0 Eric B 11 months ago in reply to Eric Lewis

Prodigy 80 points

Hello Evan,

I would also add to the Eric Lewis's reply that it could be really useful to understand the real problem behind and what this bit(cfg_rescal_en) is use for. Don't get me wrong but the solution of writing 0x1F = 0x8000 is a bit like asking someone to reboot its computer when something wrong happens. We sell product with this phy and getting the complete information is mandatory to make sure everything can be resolve without going to a redesign with another phy.

Best regards

0 Evan Mayhew 11 months ago in reply to Eric B

TI__Mastermind 24988 points

Hi Eric, Eric,

Thank you for the detailed explanation of the problem history and test cases. I share your frustration with the lack of context on cfg_rescal_en, still looking into addressing this with r&d.

I'd like to explore workarounds that do not rely on PWDN pin in the meantime.

Eric B said:
Using the bit 11 instead of PWND pin can introduce small window of opportunity the link can come up after releasing the reset. Some EtherCAT master are very sensible and can generate errors when this happens.

Can we confirm the sensitivity when using 0x0[11] to assert power-down, rather than PWDN pin? Registers should be accessible for a brief window before link can come up, if we can take advantage of this timing to instead assert power-down through registers.

Thank you,

Evan

0 Eric Lewis 11 months ago in reply to Evan Mayhew

Prodigy 95 points

Hi Evan,

We did the tests writing bit 11 of BMCR after the 2ms needed after the reset release. Thousands of loops were done shutting down the power and restarting or by just restarting a new loop iteration after having asserted the reset of the PHY. Both methods showed no problems and the PHY was able to link every time.

The problem using this method is that uboot and Linux drivers are doing accesses to the BMCR from many functions (specific and generic ones). Drivers must be modified (particular board files and generic files for PHY) to intercept the fact that the IEEE power down bit is set and as an example avoid doing the standard reset using bit 15 of the BMCR. Any changes to a new version of uboot or Linux requires to compare the code against the changes and re-apply again the fix to this new version. Using the powerdown pin do not require as many modifications to the drivers. It can be a single I/O state asserted by a first stage boot loader or an FPGA and released later after the driver has done its regular stuff.

We would like to know when are we going to receive information about this problem? Does your support team will try to reproduce it by using a pull-down on the powerdown/interrupt pin and releasing it later than the reset? From our side, it seems pretty easy to reproduce. Powering down the board used is even not necessary.

Else, others will also believe that the powerdown pin can be used and will run in the same problems we had.

Please explain us why you wanted to explore workarounds that do not rely on PWDN pin.

Best regards,

Eric Lewis

0 Evan Mayhew 11 months ago in reply to Eric Lewis

TI__Mastermind 24988 points

Hi Eric,

This seems to be an edge case behavior with the PHY's power sequence when powering with PWRDN pin driven low. As you have confirmed that setting power-down mode with register access does not cause the issue, there may be another workaround with the pin.

Eric B said:
We use the PWRDN pin to prevent link until ready. The problem happens (randomly) only in this sequence:

Power up with reset pin active and PWRDN active;

Release the reset pin and keep PWRDN active;

wait few seconds for the system ready and release PWRDN pin to allow link be establish;

The link never come up and cfg_rescal_en bit is always true.

For this sequence, please test with step (1) changed for PWRDN pin driven low only after the PHY is powered (reset still active).

The timing of driving PWRDN low can be adjusted to the same short window between when PHY is powered and PHY links.

Do you see the issue occur in this case, and does this satisfy the system requirements?

Thank you,

Evan

0 Eric Lewis 11 months ago in reply to Evan Mayhew

Prodigy 95 points

Hi Evan,

Thanks for the answer.

I believe it is not related only to the power up of the PHY because the latest tests we did showed the problem even if the PHY power was kept on and only an hardware reset was done (with the powerdown pin asserted at the same time). This reset in loops test showed the problem.

In fact these tests resetting the PHY in loops were much more quicker to do. After releasing the reset and then the powerdown (this latter either by pin or by BMCR), the BMSR was read twice 3 seconds later to see if the link was present or not, then the CDCR register was read and if the cfg_rescal_en bit was set, it was cleared. Three second later the BMSR was read twice again and the CDCR was read again. Every time the cfg_rescal_en was set for the loop after having waited for initial 3 seconds (confirming no link), clearing it always showed the link was up 3 seconds later. Each loop iteration without the problem could be done within 5 seconds. Results about numbers of errors were much more quicker to do using a standalone test application than modifying code in uboot. Signals have been captured on oscilloscope to make sure they met the intended sequence.

So if not mistaken, the requested test changing step (1) has already been done because the power on the PHY was already asserted and never removed during these "reboot" tests.

If you want me to check something else, let me know.

Best regards,

Eric Lewis

0 Evan Mayhew 11 months ago in reply to Eric Lewis

TI__Mastermind 24988 points

Hi Eric,

Eric Lewis said:
even if the PHY power was kept on and only an hardware reset was done (with the powerdown pin asserted at the same time).

For this test, was PWRDN pin driven low during the initial power-up for the PHY?

Thank you,

Evan

0 Eric Lewis 11 months ago in reply to Evan Mayhew

Prodigy 95 points

Hi Evan,

No the powerdown pin was not driven low during the initial and only power-up.

There is a pull-up on the powerdown/interrupt pin as we have made this a provision to detect link changes through an interrupt (if ever needed). There is a pull-down on the reset pin.

During my tests with a stand alone application, the FPGA used to drive the signals to the PHY was not asserting the powerdown pin at the power up nor the reset.

Since our platform use a SoC, the first stage boot loader has to load the bitstream first so the FPGA pins can be driven by the processor. Both reset and powerdown pin were not driven by the FPGA until the FPGA bitstream has been loaded (both reset and powerdown pins are tri-stated until the processor enable these pins). The reset is kept asserted by the pull-down.

An external 25MHz oscillator is sending the reference clock as soon as the 1V8 and 3V3 are provided on the PHY (oscillator is supplied from the same 1V8 rail as the PHY).

Once loaded from non-volatile memory, the stand alone test application was simply putting the pins in the proper state (resetting the PHY and asserting the powerdown), then negating them with proper timing, was doing the wanted sequence through SMI, recording the results and was doing a new test loop re-asserting the reset and powerdown.

Lots of sequences have been tried, some of them involving waiting before doing SMI accesses after releasing the powerdown because no timing requirements are defined according to this external signal. Even waiting for seconds before doing SMI accesses was showing sometimes that the link was not up and reading the CDCR confirmed the stopped analog calibration with the cfg_rescal_en bit set.

As I told you, it is fairly simple to reproduce. You should be able to reproduce it within 50 loops. I ran tests for week-end with up to more than 40K loops without having the problem when we were clearing the cfg_rescal_en bit as a workaround.

Best regards,

Eric Lewis

0 Evan Mayhew 11 months ago in reply to Eric Lewis

TI__Mastermind 24988 points

Hi Eric,

We have reproduced this issue, and confirmed it as a known bug with design.

The possible work-arounds are to assert PWRDN using register commands, or to soft reset the PHY in the failing case.

Apologies for the challenges here.

Thank you,

Evan

0 Eric Lewis 11 months ago in reply to Evan Mayhew

Prodigy 95 points

Hi Evan,

Thanks for confirming what we have seen so far on our side and for your generous support through these investigations.

May I ask you some things?

1) Can you request to update the next datasheet version to describe the issue with the powerdown pin and to offer the alternatives you proposed us? Others would beneficiate from it.
2) Can you request to update the next datasheet version so the proper type for the cfg_rescal_en bit is defined? If customer does not have to play with this bit, probably it should be marked as reserved (must be 0) instead of R/W.
3) Do I understand well from my numerous reading from your forums that if we decide to still want to use the powerdown pin as intended and clear the cfg_rescal_en pin when it is stuck high, that it would probably degrade marginally the compliance of the Ethernet signals? I know this is a tough one but I ask it anyway.
4) Do the other PHYs that support EtherCAT and have a powerdown pin suffer from the same problem? You can answer me privately if preferred.

Best regards,

Eric Lewis

0 Evan Mayhew 11 months ago in reply to Eric Lewis

TI__Mastermind 24988 points

Hi Eric,

1,2) I have noted to clarify these points in the next datasheet revision. Thank you for bringing this issue to our attention.

3) This is a fair assumption - we have not validated any use-case manually setting this bit. Setting this bit manually will bypass the internal resistor calibration gating link, so the Ethernet signaling may not be compliant.

4) DP83826 is the only PHY we see this issue on.

Thank you,

Evan

+1 Eric Lewis 10 months ago in reply to Evan Mayhew

Prodigy 95 points

Hi Evan,

Thanks again for the answers and your precious support.

FYI :

We decided to violate the 2ms delay before using the SMI to access the PHY registers after the reset release for a test. Our impression was that it should not disturb the PHY doing only reads.
With the actual speed to read the PHY registers in our application, we have seen that it took about 150us to see the cfg_rescal_en bit asserted when reading the CDCR after releasing the reset then a supplemental approximative time of 300us to see it negated when the powerdown pin was not used (nor the IEEE power down bit in the BMCR).

A previous test mentioned in a previous post showed that setting the IEEE power down in the BMCR 2ms after the reset release was not producing the problem. We tried asserting the powerdown pin 2ms after the reset release and releasing it later and it did not produced the problem either.

We might use this latter option as a simpler fix. Some other tests to emulate properly the reset done in the BMCR by the uboot and Linux drivers have still to be conducted but this is promising.

Best regards,

Eric Lewis

0 Evan Mayhew 10 months ago in reply to Eric Lewis

TI__Mastermind 24988 points

Hi Eric,

Of course, thank you for your patience while tackling this problem with limited information.

I'm glad to see there is a workaround that is working for this. Please let me know if there is any unexpected behavior while implementing this workaround.

Thank you,

Evan

0 Eric B 10 months ago in reply to Evan Mayhew

Prodigy 80 points

Hello Evan,

I was waiting to click on the resolve button before testing the workaround in the final application. I can confirm the workaround, keeping the power down pin not asserted when doing the reset cycle (with pin or with hardware reset bit), waiting 2ms after reset and assert the power down, works great.

To me, it is the best compromise to keep the utility of this pin and also, reduce the work in u-boot and kernel to prevent BMCR reset breaking the isolation while booting.

Have a great day

Interface

Interface forum

DP83826E: Link never getting established when cfg_rescal_en in CDCR Register is set to true after reset