This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DP83867IR: Initial packets being dropped by the phy..

Part Number: DP83867IR

We’re having some issues with the TI DP83867IRRGZT 10/100/1000 Phy.  We noticed when the phy comes up and IP address is obtained,  the first initial packet sent out from out device seem to be dropped by the phy.  If we put a delay couple second delay before sending any packets out (after getting an IP) no packets are lost.  Here is some notes from my software developer

“On R700, when the network interface is brought up, I noticed that the first few request/discover messages sent to a DHCP server never made to the server. They got lost somewhere in the reader between the device driver(link layer) and the phy driver (phy layer). Tcpdump captured on the readers show all request/discover messages sent by the dhcpcd which matches with dhcpcd.log entries. Wireshark captured at the DHCP server side indicates that some initial request/discover messages never made to the server. Since tcpdump operates at the device driver level and we see all messages there, the initial messages must be lost between the phy and external network components.

To rule out an issue with the routers in between, I connect the reader to hub (with a power injector) and monitor the traffic on the hub ports using wireshark and compares with the wireshark captured at the DHCP server machine.”

Thanks!
Lauren

  • Hi Lauren,

    If I understand is correctly, the first few messages never make it out on the ethernet cable? Is the processor checking for link up before sending the packets?

    -Regards

    Aniruddha

  • Correct and yes it is.

  • Hi,

    Any update on this?

    Thanks,
    Lauren

  • Hi Lauren,

    I am a member of Aniruddha's team and will continue supporting you on this issue. What is the interval between receiving the IP address and sending the first packet? The inter-packet gap for 1G communication is 96 ns. Are you seeing any link instability between receiving the IP address and sending the first packets?

    Thank you,

    Nikhil

  • Hi Nikhil (why does that sound familiar?)

    Thanks for helping out! Here's the response from the customer below- let me know if you need further information.

    This is happening prior to having an IP address. When the network interface is brought up, dhcp client starts sending out request/discovery messages to request dhcp IP using broadcast address. The dhcp client does wait for link-up before starting to send any messages. These messages get lost in the phy if they were sent within 5 seconds after the link is up.

    I was able to configure the phy to do digital loopback and I saw all packets sent to the phy got routed back (using wireshark to capture the traffic). This proves that all packets from the higher level layers made to the phy and they got lost in the phy somewhere.

    The phy seems stable bases on its status register 0x11. Is there any other configuration or status register that you would recommend paying more attention to? Is there any timing parameter or strapping value that we should double check?

     

    Here is the content of the first 32 registers from the phy prior to any network packet being sent to the phy.

     

    1140 796d 2000 a231 05e1 41e1 0065 2001 0000 0200 0000 0000 0000 401f 0c4d 3000
     d048 6c02 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

  • Hi Lauren,

    Thanks for the info. Can you also provide registers 31, 32, 33, 86, 6E, and 6F in their register dump? Additionally, can you provide a register dump while packets are being dropped, and while no packets are lost? 

    What is the smallest delay you have put on the device to send data with no packet loss? 

    I also see that registers 9 and 10 have been changed from the default values. Have these been deliberately changed and why? 

    Thank you,

    Nikhil

  • If I add 5 seconds delay between link-up and the first packet sent, it always work. If I add 4 seconds delay, it works 50% of the time.

     

    Register 0x9 and 0x10 are probably changed by the Linux phy driver(dp83867). Is there any concern about the value in these registers?

    Here is a dump for a good case:

     

     1140 796d 2000 a231 05e1 41e1 0067 2001 0000 0200 0000 0000 0000 401f 0c4d 3000

     d048 7f02 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

     

    Here is a dump for a failing case:

     

     1140 796d 2000 a231 05e1 41e1 0067 2001 0000 0200 0000 0000 0000 401f 0c4d 3000

     d048 7f02 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

     

    Here is the dump of registers every second for 5 seconds after the network interface is up:

     

    1140 7949 2000 a231 05e1 0000 0064 2001 0000 0200 0000 0000 0000 401f 0c4d 3000

     d048 0002 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

     

     1140 7949 2000 a231 05e1 0000 0064 2001 0000 0200 0000 0000 0000 401f 0100 3000

     d048 0000 ec10 0042 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

     

     1140 796d 2000 a231 05e1 41e1 0067 2001 0000 0200 0000 0000 0000 401f 0100 3000

     d048 7f02 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

     

     1140 796d 2000 a231 05e1 41e1 0065 2001 0000 0200 0000 0000 0000 401f 0100 3000

     d048 6f02 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

     

     1140 796d 2000 a231 05e1 41e1 0065 2001 0000 0200 0000 0000 0000 401f 0100 3000

     d048 6f02 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

     

     1140 796d 2000 a231 05e1 41e1 0065 2001 0000 0200 0000 0000 0000 401f 0100 3000

     d048 6f02 ec10 0000 29c7 0000 0000 0040 6561 4444 0002 0000 0000 0000 0082 0000

    0x0031: 0x1030

    0x0032: 0x00d3

    0x0033: 0x0000

    0x0086: 0x0087

    0x006e: 0x0000

    0x006f: 0x0100

  • Hi Lauren,

    Thanks for the detailed report. Based on the register dump provided every second for five seconds after the network interface is up, register 1 bit[2] and register 11 bit[10], it looks like the PHY does not complete auto-negotiation and there is no link for the first few seconds.

    You mentioned you are checking register 11 for link before sending packets, can you please confirm? How are you ensuring there is valid link before sending packets? It seems the dhcp client may be sending packets before link has been established, which is why they are being dropped. 

    Thank you,

    Nikhil 

  • There may be some confusion..

    I wanted to provide you a dump is the first 5 seconds after the network interface is being brought up so you can see the changes, but that doesn't mean the link is up.  dhcp client always wait for link-up before starting to send packets.

     

    If you take a look at the failing and good cases, those are the dump right before calling dhcp client.

  • Hi Lauren,

    You mentioned you have tried a digital loopback. Was this during the first five seconds where packets were being dropped? If not, can you run an analog loopback during this time and check the status of the PHY?

    Additionally can you try a reverse loopback during this time? The link partner will need to send packets immediately after link to recreate the scenario where packets are lost in the first five seconds.

    Can you share the schematic?

    Thank you,

    Nikhil

  • Yes, I have tried digital loop back and saw all packets sent from the MAC routed back. This happen during the first 5 seconds. I was not able to configure analog loopback, what are the sequence for analog loop back? I used the same configuration as for digital loopback except bit [5:2] of register 0x16 configured to analog, but that didn’t work.

     

    One thing I should mention too is that in our lab network environment, I have to wait 30 seconds in order to avoid packets lost vs 5 seconds in the office environment. Also, we use POE in the office and potentially POE+ in the lab.

     

    Does it makes sense to try the reverse loopback if the issue is with transmitting?

  • One thing that I discover today is that this problem seems to occur when we connect the system to a Cisco switch. I don't see the initial packet drop issue with TP-Link and TrendNet switches.

    Hopefully that helps narrow down the issue a little.

  • Hi Lauren,

    Did the problem occur using other units of the same Cisco switch? Does the Cisco switch have loopback capability? Are you able to perform a loopback with the switch?

    Thank you,

    Nikhil

  • We have figured out the issue. The problem has to do with spanning tree configuration with Cisco switches, we still need to fully validate the theory in the lab though.

     

    Basically what happen is that by default Cisco switch would be blocking traffic on a port until it finishes running the spanning tree protocol to figure out if network topology change or not upon link up. For ports that connect to endpoints such as workstations, there is no need to go thru this process, so they should be configured as edge ports with fast link so that traffic could be forwarded as soon as the link is up. I have validated this with my office set up and will work with IT team on the lab set up. Will reach out to you again if we need further assistance.

     

    Thank you for your help on this. At this point, the issue point to switch configuration rather than our systems.

     

  • Hi Lauren,

    Glad to hear it! Please reach out via a new thread if you need any further assistance. 

    Thank you,

    Nikhil