Problem with ARP / IP address

Rob Beck

We have ported the NDK 2 on to our own hardware using a single C6457 and a VSC8221 PHY.

I am having an interesting problem that sometimes when powered up the target fails to report on StdOut that it has an IP address. This problem occurs whether it is configured for a fixed IP address or we are using DHCP.

Further inspection reveals that when the target is successful in reporting an IP address, before hand it has broadcast a "Gratuitous ARP" that announces to other hosts its IP and MAC addresses. (I've been using Wireshark)

When the target is not successful in reporting is IP address on StdOut then the "Gratuitous ARP" has not been sent.

Does anybody know where the "Gratuitous ARP" is sent from and why sometimes the target may send it and sometimes not?

over 15 years ago

0 burhan türkel over 15 years ago

Intellectual 470 points

"Gratuitous ARP" has lots of advantages.

1. "Gratuitous ARP" helps finding duplicate IPs.

2. ARP has aging so "Gratious ARP" prevents aging.

These are the first two when I think.

In fact, I couldn't understand your problem. If you try to explain in another way, I can help you.

Burhan.

0 Rob Beck over 15 years ago in reply to burhan türkel

Intellectual 970 points

Most of the time I get the StdOut below and I can set our HTTP page on a PC and ping the target:

MAC Address read from EFUSE: 00-24-ba-7c-8f-f5
EFUSED MAC Address = 00-24-ba-7c-8f-f5
SGMII reset successful........
SGMII config successful........
Waiting for SGMII lock ...
SGMII lock successful!
Waiting for SerDes to come up ...
SerDes should be up and running ...
EMAC has been started successfully
Registeration of the EMAC Successful
Service Status: DHCPC    : Enabled :          : 000
Service Status: DHCPC    : Enabled : Running : 000
Link Status: 100Mb/s Full Duplex on PHY 0
Network Added: If-1:192.168.16.108
Service Status: DHCPC    : Enabled : Running : 017

However, sometimes I do NOT see the last 2 lines:

Network Added: If-1:192.168.16.108
Service Status: DHCPC : Enabled : Running : 017

Then I cannot get to our HTTP page and I cannot ping the target.

NOTE: The behaviour is the same if I do not use DHCP and instead use a fixed IP address: Sometimes I get a network added and sometimes not.

It appears that when the network is added the gratuitous ARP is also sent out by the C6457 target. However, when the network is not added the gratuitous ARP is not sent out.

Why is the NDK sometimes adding a network and sending a gratuitous ARP and sometimes it is not?

It appears that the ethdriver.c _HwPktPoll detects the link OK and calls EMAC_linkStatus which StdOut's the "Link Status: 100Mb/s Full Duplex on PHY 0", however the NC_NetStart called by our client sometimes does not call the "NetStart" function passed by parameter which StdOut's the "Network Added: If-1:192.168.16.108"

Why can I sometimes get a link but not get an IP address?

Also, could the problem perhaps be timing related: The prdNdk is scheduled to run every 100ms. On the C6457EVM the SYSCLK is 1000Mhz but we are running our hardware at 600Mhz. What would the effect be if the prdNdk were not serviced on time? Could it account for getting a link from the PHY but not getting an IP address?

0 burhan türkel over 15 years ago in reply to Rob Beck

Intellectual 470 points

Rob Beck said:

Why can I sometimes get a link but not get an IP address?

Also, could the problem perhaps be timing related: The prdNdk is scheduled to run every 100ms. On the C6457EVM the SYSCLK is 1000Mhz but we are running our hardware at 600Mhz. What would the effect be if the prdNdk were not serviced on time? Could it account for getting a link from the PHY but not getting an IP address?

I hope the problem was link but now it seems impossible. Maybe, there isn't enough data buffer.

Burhan.

0 Rob Beck over 15 years ago in reply to burhan türkel

Intellectual 970 points

Reducing the prdNdk from 100ms to 50ms seems to make the HTTP / ping more reliable but difficult to tell.

Reducing the prdNdk to 20ms results in no link - i.e. get the "Network Added" message but do not get the "Link Status" message.

Sometimes when I have the Network Added and the Link Status I still cannot ping the target and and arp -a shows "incomplete"

Is there a timing issue with the NDK under some conditions that would cause it not to send out the gratuitous ARP when it starts up?

I would appreciate some feedback from TI NDK experts on this.

0 ArunMani over 14 years ago in reply to Rob Beck

TI__Genius 9915 points

Hi Rob,

I have directed the issue to the core NDK team. Will keep you updated once I get any info.

Thanks,

Arun.

0 Hao over 14 years ago

TI__Intellectual 1485 points

One idea how to debug this problem:

If you are using CCS, can you check the EMAC statistics registers by viewing the memory starting from 0x2c80200, check TXGOODFRAMES and TXBCASTFRAMES (refer to c6457 chip spec). In the failure case, if TXBCASTFRAMES > 0, and you still can not capture the Gratuitous ARP packet, the problem probablly is in the PHY, please check your phy configuration. If TXGOODFRAMES = 0, the problem maybe in NDK or EMAC driver, you can set a breakpoint at nimu_eth.c or ethernet.c to identify if NDK sends the ARP packet or not.

0 Rob Beck over 14 years ago in reply to Hao

Intellectual 970 points

Thanks, I'll try the EMAC register thing but from my original question:

Where in the code is the gratuitous ARP sent from?

If it is somewhere in nimu_eth.c or ethernet.c as you mention which function sends it? I cannot find it in the code (I could have missed it). If I could find it then breakpointing and finding out why would be easier.

0 Hao over 14 years ago in reply to Rob Beck

TI__Intellectual 1485 points

The function to send a packet to EMAC in nimu_eth.c is EmacSend(), the EMAC driver function is HwPktTxNext() in ethdriver.c (sorry, not ethernet.c), both files are located under "ndk_2_1_0\packages\ti\ndk\src\hal\evm6457l\eth_c6457"

In the NDK code, I think the call flow is "NC_SystemOpen-->NC_NetStart-->NS_BootTask->SPINet-->NtAddNetwork->BindNew-->gratuitous arp sent"

0 Steve15 over 14 years ago

Intellectual 385 points

I'm working with a DM648 and a VSC8221 PHY . I'm just getting started on this, but I'm not clear what needs to be done to adapt the NDK to the 8221. Any help you can provide would be great. What files/functions did you have to replace? Did you run into any issues, aside from this gratuitous ARP?

Steve

0 Rob Beck over 14 years ago in reply to Steve15

Intellectual 970 points

The problem here also seems connected to http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/72941/265018.aspx#265018

I've traced TX messages down through the following functions:

EmacSend (nimu ...)
HwPktTxNext (ethdriver.c)
EMAC_sendPacket (csl_emac.c)
emacEnqueueTx (csl_emac.c)

And then I've put breakpoints on the Emac_TxServiceCheck and emacDequeueTx.

From what I can determine, when the gratuitous ARP is not sent on the wire, the TX messages still go through the EMAC OK.

Failure of DHCP also occurs because the DHCP discovery is sometimes not sent out. When DHCP is used the DHCP server is attempting to send the DHCP discover periodically but it is not appearing on the wire.

What could cause this? Why would sometimes TX messages not appear on the Ethernet cable (I'm using WireShark) whereas it appears they are going through the EMAC OK? I've looked at the EMAC statistics registers and it appears that the TX good frames and TX broadcast registers are increasing.

Whatever the problem is, it is intermittent but it is persistent in the sense that the fault if present at software reset persists until the next software reset (i.e. CCS load program or halt/restart). However, once it occurs then NO ethernet messages emerge until the next software reset. Also, if TX works then it works until the next software reset.

0 Rob Beck over 14 years ago in reply to Rob Beck

Intellectual 970 points

I've tried the nearside loop-back test (adapting Lytech code from the dual DSP C6457EVM to use on our single DSP system) and it never fails. I cannot get the connector loopback to work at all.

Have any of the TI people any idea why sometimes grat. ARP and before it DHCP discover will emerge on the wire and sometimes will not.

I have compared our PHY MII and externded MII registers (a VSC8221) for the success and failure cases and they are identical. I have also compared the SGMII registers in the DSP for the success and failure cases and again identical.

Also, as above, I've looked at the DSP EMAC statistics registers for TX packets and found that they increment whether DHCP discover and grat. ARP emerge on the wire or not.

Does anybody have any ideas based upon the data why sometimes messages do not emerge on the wire (TX) but RX always works?

Note that after a software reset, the TX is either (1) OK until the next software reset with no problems, or (2) does not work at all until the next software reset. i.e. it is only intermittent in the sense that after a software reset TX packets may emerge on the wire or not. The problem if present then persists under the next software reset. This is verified because the DHCP Daemon on the DSP periodically attempts to send DHCP discovers which do not emerge.

This kind of suggests a set up problem order / timing or something to do with clocking / locking - something that once it locks faulty it stays that way.

Ideas? (Particulary TI personnel)

0 Rob Beck over 14 years ago in reply to Rob Beck

Intellectual 970 points

Looks like two problems going on here which may or may not be connected:

(1)

(Ethdriver) HwPktTxNext calls (csl_emac) EMAC_sendPacket calls (csl_emac) emacEnqueueTx to queue packet on EMAC. Interrupt handler (Ethdriver) HwTxInt calls (csl_emac) EMAC_TxServiceCheck calls (csl_emac) emacDequeueTx. EMAC registers TXGOODFRAMES and TXBCASTFRAMES indicate that the EMAC has TX'd data but the data does not emerge on the wire according to WireShark.

i.e. In this case the data reaches the EMAC queue and the EMAC interrupt routine calls and EMAC statistics TX registers indicate the data is sent, but it has not been according to WireShark.

(2)

(Ethdriver) HwPktTxNext calls (csl_emac) EMAC_sendPacket calls (csl_emac) emacEnqueueTx to queue packet on EMAC as above. However the interrupt handler (Ethdriver) HwTxInt is NOT called and, therefore, TXQueue.count decrements evenrtually to zero. The TXGOODFRAMES and TXBCASTFRAMES are both zero and data does not emerge on the wire according to WireShark.

i.e. In this case the data reaches the EMAC queue but lack of EMAC TX interrupt and EMAC statistics TX registers of zero indicates that the EMAC has not attempted to send the data. What would cause the EMAC TX interrupt not to trigger when a non-zero descriptor is written to the TX0HDP register?

I looked at the section 2.5.2 of the TI EMAC/MDIO manual describing a possible race condition for the EOQ but the (csl_emac) emacDequeueTx handles this case.

I'm using the C6457 ethdriver.c and csl_emac.c that come with the NDK 2.0 and Lyrtech C6457EVM. I've adapted the csl_mdio in order to remove the contention the C6457EVM DSPs have for the single PHY because we are only using a single C6457.

Is there an ordering problem with the set-up in ethdriver.c HwPktOpen?

0 Rob Beck over 14 years ago in reply to Rob Beck

Intellectual 970 points

Just found out something related to DSP core speed and this problem:

When I speed up our C6457 to 1GHz core speed on our product hardware the problem goes away.

Also, for the Lyrtech C6457EVM if the DSP core (of the DSP I run the software on - the EVM has 2) is slowed down to 600MHz then the same Ethernet transmission problems occur as with our hardware.

So, is there a timing issue with NDK 2.0 as regards initialisation? (and there is quite a bit going on - EMAC, MDIO, SGMII, PHY)

Of course, it may not be the NDK itself, it could be the NSP that works with the C6457 DSP.

We don't want to run at 1GHz because it generates more heat.

Processors

Processors forum

Problem with ARP / IP address