Because of the Thanksgiving holiday in the U.S., TI E2E™ design support forum responses may be delayed from November 25 through December 2. Thank you for your patience.

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM335x EtherCAT CRC errors behind devices with Micrel PHY

Other Parts Discussed in Thread: TLK106, AM3359, SYSBIOS

Hello,


one of our customers reported problems with our sitara Am3359 EtherCAT device with TexasInstruments TLK106 Phy's in direct topologic neighborhood with a device with MICREL Phy's (KSZ8721). Our device is reporting crc-errors of the frames from the micrel-Phy-device very often (all 30seconds). This is happening only if our sitara based device and the other device with micrel-Phys are connected in direct topologic neighborhood together. Then allways the sitara cut off the frame from the Micrel-device and reports a crc-error. The frames to the micrel-device are read as ok.


If there is another EtherCAT device in between, there are no problems.


We are using the actual sysbios_ind_sdk_02.01.02.02.

In Beckhoff Phy Selection Guide v2.3 there is written this:

"Fast Link Down mode with 10 μs reaction time is supported (requires configuration via MII management, default is 200 μs). Recommended configuration for Fast Link Down mode in CR3: enable Bit 3 (RX Error count) and Bit 0 (Signal/Energy loss). MI Link detection and configuration must not be used, because register 9 is PHY specific."


As I found it in tiescbsp.c FastLinkDownMode is correct used (Board_phyFastLinkDownDetEnable()) and FastRXDV (Board_phyFastRXDVDetEnable()) is not used anymore.

But what may be the reason for these crc errors and cut off frames? FastLinkDown's because of RXErrorCount?

Best regards

Frank

  • Hi,

    The EtherCAT experts have been notified. They will respond here.
  • Frank,
    did you try to measure the frames with a good network tracer? Could you provide such a trace?
    What are the phys in 'other' device?
    If that is a phy (or other HW issue such as layout) we probably need to push this issue to our Ethernet forum where phy experts are listening. I am not sure if there are known interop issues of TLK106 devices.

    Regards,
  • Our customer used an Hilscher netANALYZER at some points of the etherCAT topology:

    Topology incl. netAnalyzer-ports:

    EC-Master - - -> MicrelPhydevice -> Port0 -> AM3359device -> Port2 -> otherServodevice -> Port3 -> AM3359device-> Port1 -> MicrelPhydevice- - -> EC-Master

    Frame:                     

    Port0 = OK                                    

    Port2 = Error

    Port3 = Error

    Port1 = Error

    Our Am3359 Device with TLK106 Phy's marked the frame as short frame, alignment error and crc-error.
    The length of the frame behind our device was only about 56byte instead of 364byte in length anymore.

    analysis.zip

    We don't know the kind of Phy's in the other devices.. Sorry! We are knowing only that the problem is occuring only between our device an the device with the micrel phy and only in the direction micrel phy -> tlk106 phy. May be there are RXErrors in the phy or the EtherCAT Pruss is making an error. We could read the RXError count register from TLK106 and adding it permanently to a readable variable. Should we do this?

  • Frank,

    thanks for the log. However I don't understand your topology description in combination with Hilscher. As we use them they have to be inserted in a line such as this:

    Master - Port 0 / Port 1 - Device 1 - Port 2/3 - Device 2.

    So the single port TAP mode is something I don't get... How do you connect that? Hilscher would also record returned frames of course.

    From the log I can only see that Port 0 is a good frame. The rest are cut to 56 bytes... That could mean that A3359 is only receiving the partial frame and forwards. However the next device then should drop it as there is no valid CRC, or?

    Regards,
  • Dear Frank,

    Yes, if otherServodevice means the end of EtherCAT-Topology. Then Port0 and Port1 are Tap1 and Port2 and Port3 Tap2 of the Hilscher netANALYZER.

    EC-Master - - -> MicrelPhydevice -> Port0 -> AM3359device -> Port2 -> otherServodevice -> Port3 -> AM3359device-> Port1 -> MicrelPhydevice- - -> EC-Master

    or better:

    EC-Master - - -> MicrelPhydevice - - -> Port0 - - -> AM3359device - - -> Port2 - - ->

    otherServodevice

    EC-Master<- - - MicrelPhydevice<- - - Port1 <- - - AM3359device <- - - Port3 <- - -

    Yes, from AM3359, the frame is cut off with crc-error, align-error and short-Frame flags.

    Something inside of AM3359 (Pruss) or TLK106 Phy is happening wrong.

    I'll send your our schematics of TLK106 (both ports are identical).

  • Frank,

    ok, now I get it. You are drawing topology in a single direction. I am used to a cable based topology map. I thought you had two AM3359 in your test...

    Anyway I currently don't see a reason for malfunctioning PRU firmware here. There must be something in the RX path of the TLK106 that is stopping to receive further bytes. But I will check with the programmer.

    Is this issue reproducible somehow? Is it happening in the field only?
    Do you get the CRC errors randomly or after some time interval?

    Biser, can you ask the Ethernet experts to jump in and check the phy schematic (I don't see anything special here but I am no expert)?
    They may also know more on phy interop and need to help with debugging this. Apparently our device is working if there is another phy on the master side.

    Regards,
  • Hello Frank,

    The issue is very reproducible. It seems to be happen all 15-30 seconds in some conditions:

    Our customer also spoke about a device dependency. Unfortunately, this error pattern is device-dependent, ie a tolerance problem. Some of the micrel-phy-boards are working better, others do not work (not possible to bring a correct link up). But the micrel-phy-boards are working with other Ethercat-devices correctly in neighborhood.

    Yes, this link-problem with some of the customers boards to the our tlk106-device seems to be a phy-problem and may be not a Pruss-issue. Whats about the MDIO_enableLinkInterrupt()? This function is used inside of tiescbsp.c.

    What could we do, to get more informations and stats from TLK106 to get an answer and a possible solution?

    Frank

  • Frank,

    unfortunately for debugging we would need to be able to reproduce this. Are you able to reproduce this in the lab?
    Otherwise a longer log over a few minutes may help.

    That fact that this is only reproducible with some devices may indicate a margin issue. Again I can't help a lot with phy interop issues and such we need to involve our Ethernet phy experts. I will ask Biser again to involve them. I can't move forum entreis.

    Regards,
  • Moving this to the Ethernet forum on Franks recommendation (above).
  • Hello Frank, Hello Biser,

    our customer clarified his statements again. He got through the exchange of the MICREL-device, a correct communication over hours or an unstable communication (as in the picture above with link losts) with an other board. That means, this is a tolerance problem.

    While we use an external oscillator on the tlk106 (see the schematics pic), he is using internal FPGA-PLL clocksource as a clockinut for the MICREL phy. May be there is sometimes a larger shift and sometimes a smaler shift of the clocksource depending on the tolerance of the specific hardware? Is it possible that the tlk106 has a limited PLL syncing range for the rxdata in default? Is is possible to change this to a larger syncing range for more tolerance? It seems for me like sync problems... Is this possible?

  • Hi Frank and All,

    This sounds like a TX jitter problem on the side of the Micrel device, as you all have determined.

    The TLK106 is already optimized for high input jitter tolerance, and it is difficult for us to improve the IJT further.

    Is it possible to check the TX jitter of the Micrel PHY using the IEEE defined 100base-tx method?

    Best Regards,
  • Hi Frank,
    we tried to reproduce the issue using similar packets on our hardware. However we were not able to do as there are no CRC errors happening. This seems to underline the thesis that the CRCs are due to some phy issues.
    Regards,