This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DP83826E: Link Drop Issues

Part Number: DP83826E

I wanted to post a resolution to this thread, but it is locked.  Can you unlock it?

e2e.ti.com/.../dp83826e-dp83826e

  • Hi Kevin,

    I cannot unlock this thread, but please feel free to post the resolution here and I will make sure it gets back to David.

    Sincerely,

    Gerome

  • We had purchased the same motor controller (manufacturer substitution on board), but there were was a newer variant that had the DP83826, where it used to use the KSZ8081.  We had link drops only on the version with the DP83826.  There were multiple factors with these link drop issues , but the biggest culprit was the type of cabling we were using.  It was hard to see because ethercat worked well with the motor controller that had the KSZ8081, and the link never dropped in that configuration so we had ruled out the cabling because of that. 

    The bad behavior was such that we would get ethercat Rx PHY errors but no CRC errors.  After discussing this further it seemed this was corrupted symbols between frames, and after decoding it on the tektronix scope, that was exactly what we saw.  The hard part was we didn't see a smoking gun on the ethercat signals that constituted a reason for the link to drop.

    We were eventually able to gain access to the PHY configuration registers inside the manufacturer's motor controllers for both types of PHY chips (via TwinCAT) because ethercat would work long enough to communicate for short periods of time.  We were able to determine that on the older units with the KSZ8081, the fast link down setting (0x0B was set to 0x00) feature was not enabled, but on the DP83826 it was enabled (0x0B was set to 0x03).  We tried writing to the PHY.  We turned off fast link down on the DP83826 (0x0B set to 0x00) and suddenly ethercat would work reliably as it did before on the KSZ8081.  There were no PHY errors of any kind in the motor control software's ethercat diagnostics which are quite comprehensive in my opinion after running with fast link down turned off.  This made me conclude this was the difference in behavior we were seeing between the two PHY chips.

    After testing the custom cabling in a fluke 100-Base-Tx cable tester, it revealed some very poor performance with NEXT and ACR on the custom cable.

    I want to note that the final fix was to change the cabling to the correct type, and not to turn off fast link down even though that was an option.  Changing the cabling type to the correct type with fast link down still turned on worked reliably with no link losses.  We are going to request to the motor manufacturer that they provide us access to fast link down settings in their software for troubleshooting purposes in the future. 

    For me, the difficulties in this issue were:

    1.  Not having a clear understanding from PHY manufacturers or in 802.3 of what total scope of functionality can cause link drops.  It seems to be very buried or not even documented anywhere on how that block works.  I think PHY manufacturers should be more transparent with this.

    2.  Not understanding how fast link down actually works in the chip and what types of monitoring and measurement thresholds actually will trigger a link loss.   According to the registers it says signal/enery loss is one method, but I have no idea what that means or quantifies to as a user to check against that.  Another trigger is signal to noise ratio, but there is no actual benchmark in the datasheet for what value of signal to noise ratio will trigger to fast link down.  I have no way to identify if I am above or below that threshold in performance.

    3.  Not having direct access to the PHY registers from the motor controller software was a nightmare to overcome.

    4.  I could never see any obvious problems with signal interruption, excessively noisy signal, or signal loss on the ethercat signals in the time domain on the scope despite those being the things that triggered fast link down.  There were some non-deal eye diagram fits, so we were not perfectly compliant but ethercat would work reliably (no errors) anyway with fast link down turned off, which is very misleading.  I am guessing that the poor NEXT performance on the cable was causing an issue, but I could never see the smoking gun in the time domain.  For instance, if TI provided the signal to noise ratio threshold for that to trigger, I could measure for that.

    5.  Fast link down provided an indicator of something I would not have normally seen without analyzing the cable characteristics in great detail, but it was very difficult to associate that error with one particular performance metric.  I honestly still don't know what that metric was specifically, other than poor cable design through multiple bad performance tests from a 100Base-Tx cable tester.

    Thanks,

    Kevin