DP83869HM: Large packet loss / possible FIFO overflow

Steven Hansen

Part Number: DP83869HM
Other Parts Discussed in Thread: DP83869, AM6442

I am seeing high (>10%) packet loss using DP83869 in RGMII to Copper mode at 1Gbps. I originally saw this issue using a custom board, but have also been able to duplicate this problem on TI's AM64EVM board. I've narrowed down the problem to what appears to be some kind of FIFO overflow in the PHY when operating at 1Gbps.

With TI's AM64GPEVM board I can duplicate the problem by transmitting a 1516 byte packet repetitively (using PRU core to achieve low latency transmit). I have the IPG configured for 192ns, which is double 96ns to ensure no IPG issues. I've also verified the IPG by looking at time between packets on the wire. I am currently only transmitting packets (no RX) to simplify debugging. Packets are captured on the wire using a ProfiShark 1G+.

The problem occurs after transmitting 73 packets (1516 bytes each) successfully back to back with 192ns IPG between each packet. After this, packets 74 and 75 are always dropped (not seen via Profishark tap). I do see packets 74-75 between MAC and PHY via oscilloscope on the RGMII TXC/TXD lines. After dropping packets 74-75, packet 76 is successfully seen on the wire. Thereafter roughly 7-8 packets are dropped for every 30 packets sent. Since I've seen this problem on multiple boards and even different PHYs on the same board, this appears to be some kind of FIFO overflow issue in the PHY.

If I reduce the transmit speed (one 1516 byte packet every 20us, every 50us, every 100us...) the packet loss improves. If I send fewer than one 1516 byte packet every 200 microseconds, the problem goes away entirely. Here are other ideas I've tried:

1. Verified the PHY is configured per the datasheet section 9.4.8.1 for RGMII to Copper (BCMR=0x1140, PHY_CONTROL=0x5048, GEN_CFG1=0x300)

2. Verified I'm using TI's provided value for TX clock shift on the ICSSG ports of the AM64EVM board (750ps)

3. Tried varying the TX CLK shift. It does not seem to have an impact until it is >1500ps, at which point I read garbage data on the wire (as expected).

4. Tried 0x1, 0x2, and 0x3 values in TX_FIFO_DEPTH for PHY_CONTROL register 0x10. Although the datasheet comment seems to indicate TX_FIFO_DEPTH doesn't apply for RGMII.

5. Checked INTERRUPT_STATUS register 0x13. I'm seeing the XGMII_ERR_STATUS, ADC_FIFO_OVF_UNF, and FALSE_CARRIER bits set.

What else can I do to troubleshoot this problem?

over 2 years ago

0 Steven Hansen over 2 years ago

Intellectual 666 points

More testing has shown that sending larger packets (3000 bytes) causes this problem to occur sooner (problem occurs after ~40 packets sent), and shorter packets (300 bytes) cause the problem to occur later (problem occurs after ~400 packets sent). So it does seem to be some type of overflow or similar problem.

But even at a relatively "slow" rate like one 1520 byte packet every 20 microseconds (~60% of gigabit capacity) the problem occurs frequently where sent packets are not appearing on the wire.

Any thoughts?

0 Melissa Chang over 2 years ago in reply to Steven Hansen

TI__Genius 14006 points

Hi Steven,

Thank you for providing a detailed breakdown - I have a few questions to help debug your issue:

Could you provide a register dump of registers 0x0-0x1F?

In the custom board, is the processor being used also the AM64?

Steven Hansen said:
With TI's AM64GPEVM board I can duplicate the problem by transmitting a 1516 byte packet repetitively (using PRU core to achieve low latency transmit)

Where are you sending the packets from? What is the link partner? Is it a TI PHY?

Best regards,

Melissa

0 Steven Hansen over 2 years ago in reply to Melissa Chang

Intellectual 666 points

Hi Melissa,

Thanks for responding! Custom board uses AM6442 too.

I can duplicate this problem sending packets to a TI link partner (2nd DP83869 on same board), a non-TI link partner (laptop), and a 2nd non-TI link partner (ProfiShark 1G configured in SPAN mode).

Here is a register dump taken while dropping packets:

0x0 = 0x1140

0x1 = 0x796D

0x2 = 0x2000

0x3 = 0xA0F1

0x4 = 0x01E1

0x5 = 0xCDE1

0x6 = 0x006F

0x7 = 0x2001

0x8 = 0x6001

0x9 = 0x0300

0xA = 0x3C00

0xB = 0x0000

0xC = 0x0000

0xD = 0x401F

0xE = 0x0C1F

0xF = 0xF000

0x10 = 0x5048

0x11 = 0xBC02

0x12 = 0x0000

0x13 = 0x1D44

0x14 = 0x29C7

0x15 = 0x0000

0x16 = 0x0000

0x17 = 0x0040

0x18 = 0x6150

0x19 = 0x4444

0x1A = 0x0002

0x1B = 0x0000

0x1C = 0x0000

0x1D = 0x0000

0x1E = 0x0212

0x1F = 0x0000

Note I did unplug and re-plug the Ethernet cable after bringing up the PHY, but before sending packets. If I repeat the same test but dump the registers before plugging the Ethernet cable and again after sending packets, the second time I see 0x0000 in 0x13 (INTERRUPT_STATUS) and 0xAC02 in 0x11 (PHY_STATUS) and 0x006D in 0x6 (ANER).

Let me know if I can provide additional info, thanks!

0 Melissa Chang over 2 years ago in reply to Steven Hansen

TI__Genius 14006 points

Hi Steven,

Thank you for providing this information. I need some time to review this issue and try out some tests in our lab, I will update you by Friday 6/30.

Best regards,

Melissa

0 Steven Hansen over 2 years ago in reply to Melissa Chang

Intellectual 666 points

Hi Melissa,

We just discovered that the USB cable on our network tap was faulty, causing it to silently drop packets in the background. This "hardware dropped packets" counter was hidden on a screen we rarely use, so it wasn't immediately obvious this was occurring. We switched to another USB cable and our tap now shows all packets on the wire.

Apologies for taking your time on this! The PHY is working great at full line rate.

Thanks again!

Interface

Interface forum

DP83869HM: Large packet loss / possible FIFO overflow