This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DP83867 RX errors seen inthe RECR counter

Hi

We have a new development with the DP83867 on board a module connected to an FPGA.

The link is up and fixed at 1G, full duplex

The transmit path from the FPGA -> PHY -> Computer works fine up to the data rate I tested on the desk (~8k packets/second)

However the rx path is unreliable. In the FPGA I receive a large number of RX_ER even at low rates (1 packet/s). Initially I had suspected that the RX clock delay was not being set correctly however regardless of the setting the RX_ER remains.

In addition and more importantly, I can poll the Receiver Error Counter Register (RECR), Address 0x0015 over the MDIO interface and I see the counter incrementing as I send packets. So this would suggest to me that the phy is seeing the errors even before the RGMII interface to the FPGA,

I have looked at a few of the other registers on the MDIO interface but I didn't see anything suspicious, however I may not be looking at the correct registers.

The datasheet does not document what would cause an error in the RECR register or why this would increment. Do you have any documentation about this.?

Is there any other registers I should read that might point to the source of the problem? Any thoughts or pointers would be helpful

Diarmuid

  • Hi Diarmuid,

    Please see the this thread for a quick explanation about the cause of RX_ERs:
    e2e.ti.com/.../497525

    More information can be found in the IEEE802.3 spec.

    Can you send me a register dump of the PHY?
    Registers 0x0 to 0x1F if you can would be great. Can you send me the readout before you link to the computer and after?

    What you can do is isolate the issue by doing various loopbacks.
    Please see the following app note on enabling various loopbacks to find where the error is occurring:
    www.ti.com/.../snla246a.pdf

    Kind regards,
    Ross
  • Ross

    Thanks for the reply.

    Before Connection:
    Read address 0x0 Data= 0x1140
    Read address 0x1 Data= 0x7949
    Read address 0x2 Data= 0x2000
    Read address 0x3 Data= 0xA231
    Read address 0x4 Data= 0x1E1
    Read address 0x5 Data= 0x0
    Read address 0x6 Data= 0x64
    Read address 0x7 Data= 0x2001
    Read address 0x8 Data= 0x0
    Read address 0x9 Data= 0x300
    Read address 0xA Data= 0x0
    Read address 0xD Data= 0x0
    Read address 0xE Data= 0x0
    Read address 0xF Data= 0x3000
    Read address 0x10 Data= 0x5048
    Read address 0x11 Data= 0x2
    Read address 0x14 Data= 0x29C7
    Read address 0x15 Data= 0x0
    Read address 0x16 Data= 0x0
    Read address 0x17 Data= 0x40
    Read address 0x1E Data= 0x2
    Read address 0x1F Data= 0x0
    Read address 0x31 Data= 0x10B0
    Read address 0x32 Data= 0x40D3
    Read address 0x33 Data= 0x0
    Read address 0x86 Data= 0x77
    Read address 0x134 Data= 0x1000
    Read address 0x135 Data= 0x0

    After connection:
    Read address 0x0 Data= 0x1140
    Read address 0x1 Data= 0x796D
    Read address 0x2 Data= 0x2000
    Read address 0x3 Data= 0xA231
    Read address 0x4 Data= 0x1E1
    Read address 0x5 Data= 0xCDE1
    Read address 0x6 Data= 0x6F
    Read address 0x7 Data= 0x2001
    Read address 0x8 Data= 0x6001
    Read address 0x9 Data= 0x300
    Read address 0xA Data= 0x3C00
    Read address 0xD Data= 0x401F
    Read address 0xE Data= 0x0
    Read address 0xF Data= 0x3000
    Read address 0x10 Data= 0x5048
    Read address 0x11 Data= 0xBD02
    Read address 0x14 Data= 0x29C7
    Read address 0x15 Data= 0x0
    Read address 0x16 Data= 0x0
    Read address 0x17 Data= 0x40
    Read address 0x1E Data= 0x2
    Read address 0x1F Data= 0x0
    Read address 0x31 Data= 0x10B0
    Read address 0x32 Data= 0x40D3
    Read address 0x33 Data= 0x0
    Read address 0x86 Data= 0x77
    Read address 0x134 Data= 0x1000
    Read address 0x135 Data= 0x0

    What I have jsut noticed is the there are two link_status bits. One in PHYSTS and one in BMSR. However they do not agree when the link is up. BMSR is low.
    Edit: In fact the value in BMSR seems to vary over time, mostly high


    Diarmuid

  • Ross

    I've been doing some testing this morning with the loopback configuration. Because I am using an FPGA it was easier for me to start with Far-End Loopback (Reverse Loopback). As I understand the loopback is just at the mii interface.

    The result of this testing is that I can see much more reliable performance of the data path. I have tested with 300 packet/s from my PC to the phy in loopback mode and successfully receive the packets in the PC. At higher rates I don't trust my verification script but visually checking would suggest that the packets are reliably loopbacked.

    So now I'm somewhat confused. Why would I be seeing RECR counts incrementing in my earlier test but in the reverse loopback the data seems to be reliably received and transmitted?

    Incidently the RECR does not increment duing this loopback test. If there were RX_ERs present in the data would it be visible in the loopback scenario?

    Thanks
    Diarmuid
  • Hi Diarmuid,

    Register 0x01 is a LL register so you will need to do two reads to guarantee link status. Register 0x11 is not a LL. This might explain why you see variation on 0x01.

    Thank you for the information and great debug work.

    Nothing in the register dump you provided stands out besides the fact that the wires look to be crossed and there is inverted polarity detected, but the device will account for that and correct for error free operation.

    You should be seeing errors returned is the error is happening on the MDI. Since you are looping back before it is outputted on the MII, the issue could be with the connection between PHY and MAC.

    The data that you send to the PHY will still be visible on the RX pins with reverse loopback (unless you enabled the isolation bit).
    What i suggest is to place a scope probe on RX_CTRL along with RX_CLK. This will allow us to see if the error is appearing on the MII.

    What type of delay are you using? Internal delay within the PHY? MAC? or trace delay? You said that you tried playing with the ID but it had no effect? What setting are you using?

    Thank you,
    Ross
  • Ross

    Can you clarify this statement. I don't understand it :
    "You should be seeing errors returned is the error is happening on the MDI. Since you are looping back before it is outputted on the MII, the issue could be with the connection between PHY and MAC."

    What I have seen is that I get lots of errors based on the RERC counter when sending packets to the phy in non-loopback mode.
    However as soon as I enable reverse loopback, and send the packets, I no longer see RERC counter incrementing?

    I can't reconcil this behaviour? How could the mii interface have any impact on the RECR counter? Is it possible that there's a bug in the phy which prevents the counter from incrementing in loopback? (I'm grasping at straws here!)

    With regard to the delays. For the tx delay I am using the PHY. For the rx I was initially using the MAC but have since moved to the PHY too. In neither scenario I can;t get reliable rx working. I do plan on looking at the RX_CTRL and RX_CLK and using this to understand the timing www.ti.com/.../snla243.pdf However I am still not convinced that the evidence I am seeing proves conclusively that the mii is the source of the problem

    Diarmuid
  • Hi Diarmuid,

    If errors are occurring on the cable side, then I expect them to be returned to the computer when doing reverse loopback.

    Can you enable the MII or digital loopback and send packets from the FPGA and return them back to it?

    This will help in the debug. Also, please send me your schematic for review. I only need the 867 and passives around it.

    Kind regards,
    Ross
  • Ross

    I'm doing some loopback tests at the moment and will update the thread when I get it. In the mean time this is the instance of the PHY

    I noticed the resistors on the RX pins which I am currently getting changed to 0R. I'd welcome any comments about the schematic !

    Diarmuid

  • Ross

    I have repeated a number of tests here and while I am not near a solution I have at least found some consistent behaviour.

    1. I have repeated the far-end loopback test using the phy. I can confirm that when I configure the loopback test with the REV_LOOP_RX_DATA_CTRL= 0 (suppress RX packets to MAC) the loopback works perfectly. I do not dropped packets on the link partner.
    2. Once I set REV_LOOP_RX_DATA_CTRL=1 (allow RX packets to MAC) I see dropped packets in the loopback test (~20%) AND I see the RECR counter incrementing.
    3. I have setup the FPGA to perform a loopback from RX to TX. With the correct rx and tx delays configured in the phy, I see the same results as in test 2. Some packets are dropped and the RECR counter increments accounting for the dropped packets 

    So to me it looks like the suppression of the RX packets is key. By doing this it's either directly or indirectly impacting on the performance.

    I haven't put a scope on the RX_CLK and RX_CTL pins yet but I wonder if the clock is running when REV_LOOP_RX_DATA_CTRL=0? I will put the scope on it tomorrow

    I tried the DIGITAL loopback mode on the MAC side and that seems to work. I didn't do exhaustive tests but it seems reliable, I'll see if I can verify that thoroughly tomorrow.

  • Hi Diarmuid,

    I don't want to distract you from the debug path Ross has taken you on but I want to see if you can answer a few quick questions on your initial setup.

    What kind of computer are you using? What model of NIC is in the PC the 867 is attached to?

    Can you try writing 0x170 to register 0x31 and see if this clears up your RX_ER issues?

    Best Regards,
  • Hi 

    Thanks for the suggestion.

    The network card is an Intel I218 LM (https://downloadcenter.intel.com/product/71307/Intel-Ethernet-Connection-I218-LM)  However for the most part I've been using it while connected to  Netgear GS605v3 desktop switch. I have tested it directly (not as throughly) but it does seem to exhibit the same behaviourI

    I tried that write to the register and it definately has an impact on the RX_ER count in the reverse loopback mode. I need to quantify during the day. What exactly have I changed? According to the datasheet the bits are reserved?

    Diarmuid

  • Diarmuid,

    We have seen conflicts with certain Intel NICs, and even some switches, that is related to auto-negotiation. The HW solution is strapping the RX_CTRL pin to mode 3.

    The before mentioned register configuration affects a timer that we'd otherwise like to be untouched, hence reserved. This is only necessary if RX_CTRL is not in mode 3. Please set this value into the register before auto-negotiation takes place.

    This is slated to be updated in the DP83867 DS revision.

    Best Regards,
  • Rob

    Thanks for the update. I'll try that out.

    We currently do not control the RX_CLTRL for strap, we were assuming that whatever would be set could be changed via the MDIO? 

    Diarmuid

  • Diarmuid,

    That is correct in this case. You can use MDIO to change the value after start up.

    Regards,
  • For the record, one issue that I discovered that was promped by Rob's post. I tied off the PHY_LED* + PHY_GPIO* strap pins in the FPGA to low. I had thought this was a safe setting but it seems that this had a major negative impact on the performance of the phy in our application. So I left them undriven from the FPGA and it works better. I haven't dug any depper than that but just in case someone else is having issues, this might help

  • Diarmuid,

    Did you resolve your RX_ER issues?
  • Rob

    I am still seeing RX_ER but for the moment I am working around this in my particular design. However I will have to resolve it at some point so I'd like to keep this open for the moment

    Diarmuid

  • Hi guys

    Some more questions for you. I am seeing some behaviour that I can't explain at the moment. On our board we have two phys with a shared MDIO interface.

    On some builds of my FPGA one of the phys never comes up. However it is responsive over MDIO. And what I've noticed is that it's constantly in a reset state (bit 15 BMCR). What would be preventign the phy from coming out of reset. I do know that some FPGA builds don't exhibit this behaviour so it's related to something the FPGA is doing, however I can't figure out what. The hardware reset pin is tied high in both cases.

    Here are some MDIO registers.

    Working PHY

    Read address 0x0 Data= 0x1140
    Read address 0x1 Data= 0x7969
    Read address 0x2 Data= 0x2000
    Read address 0x3 Data= 0xA231
    Read address 0x4 Data= 0x1E1
    Read address 0x5 Data= 0xCDE1
    Read address 0x6 Data= 0x6F
    Read address 0x7 Data= 0x2001
    Read address 0x8 Data= 0x6001
    Read address 0x9 Data= 0x300
    Read address 0xA Data= 0x3CFF
    Read address 0xD Data= 0x1F
    Read address 0xE Data= 0x0
    Read address 0xF Data= 0x3000
    Read address 0x10 Data= 0x5048
    Read address 0x11 Data= 0xBD02
    Read address 0x14 Data= 0x29C7
    Read address 0x15 Data= 0x0
    Read address 0x16 Data= 0x0
    Read address 0x17 Data= 0x40
    Read address 0x1E Data= 0x2
    Read address 0x1F Data= 0x0
    Read address 0x31 Data= 0x10B0
    Read address 0x32 Data= 0x10D3
    Read address 0x33 Data= 0x0
    Read address 0x6E Data= 0x3
    Read address 0x6F Data= 0x0
    Read address 0x86 Data= 0x77
    Read address 0x134 Data= 0x1000
    Read address 0x135 Data= 0x0

    Phy in reset

    Read address 0x0 Data= 0xC0
    Read address 0x1 Data= 0x7949
    Read address 0x2 Data= 0x2000
    Read address 0x3 Data= 0xA231
    Read address 0x4 Data= 0x1E1
    Read address 0x5 Data= 0x0
    Read address 0x6 Data= 0x64
    Read address 0x7 Data= 0x2001
    Read address 0x8 Data= 0x0
    Read address 0x9 Data= 0x300
    Read address 0xA Data= 0x0
    Read address 0xD Data= 0x401F
    Read address 0xE Data= 0xD3
    Read address 0xF Data= 0x3000
    Read address 0x10 Data= 0x5048
    Read address 0x11 Data= 0x8802
    Read address 0x14 Data= 0x29C7
    Read address 0x15 Data= 0x0
    Read address 0x16 Data= 0x0
    Read address 0x17 Data= 0x40
    Read address 0x1E Data= 0x2
    Read address 0x1F Data= 0x0
    Read address 0x31 Data= 0x170
    Read address 0x32 Data= 0xD3
    Read address 0x33 Data= 0x0
    Read address 0x6E Data= 0x0
    Read address 0x6F Data= 0x0
    Read address 0x86 Data= 0x77
    Read address 0x134 Data= 0x1000
    Read address 0x135 Data= 0x0

  • One more very unusual issue I've seen. Reception of Ethernet packets will long runs of 0x0 or 0xF seem to be problematic producing RX_ER from the phy. I have no idea why.

    I haven't quantified the issue yet as I can work around it but sooner or later I'll have to look more closely at it and solve it. Maybe this might be familiar to one of you?