I'm using lwIP on 9B92, and even though I thought I fixed this already, something breaks when I connect to a hub instead of a switch.
Normally I have the device connected to a 100Mb/s switch. No problems. But when I connect it to my 10Mb/s hub so I can monitor traffic with my computer, lwIP communicates for a short time, about 10 seconds, then the Ethernet hardware stops receiving frames. It still transmits frames, but never receives any. Only a reset fixes it. I have confirmed that the Ethernet controller itself is at fault. The code in stellarisif.c always returns NULL when this occurs:
/* Check if a packet is available, if not, return NULL packet. */ if((HWREG(ETH_BASE + MAC_O_NP) & MAC_NP_NPR_M) == 0) { return(NULL); }
Also, this issue doesn't always occur. It seems to occur more on connections over the Internet vs. connections on my LAN. I can only assume it's a latency issue. Again, absolutely no problems when connected to a switch.
I have removed ETH_CFG_TX_DPLXEN from EthernetConfigSet() so it's set in half duplex. Is that the only setting that is required?
Will the Ethernet controller go into some type of error mode and stop receiving after too many errors? I don't see anything in the datasheet related to this.
Does the standard enet_lwip example work properly? If so examine difference between your code and it.
Do I understand correctly that in half duplex mode it works as expected and the error only occurs in full duplex mode?
I believe there are some transmit and receieve error conditions that can trigger interrupts. You may want to look at if those are being triggered and how we / lwip handles them. One thing that may be happening is if receieved frames are in error and not properly dropped are you then running out of memory to allocate for the next RX frame? In any case check memory usage and the way LWIP allocates memory for packets.
Dexter
I haven't tried enet_lwip, but have based my project off of TI's example code heavily.
I have Stellaris set to half-duplex, but the problem only occurs on hubs, not switches. So I would think that Stellaris is always running in half-duplex, which is fine with me. This issue actually appears to be more of a hardware problem than anything. I have disabled full duplex by removing ETH_CFG_TX_DPLXEN from EthernetConfigSet(). I have not added any other code that takes action based on link status; I only check link status when not streaming data by reading PHY_MR1 and PHY_MR31. Interrupts are set as such: EthernetIntEnable(ETH_BASE, ETH_INT_RX | ETH_INT_TX);
Here is some additional information I obtained after more experimentation. Attached is a simple drawing of the important part of my network that might help to visualize what I'm doing. Normally I connect everything to the switch, but when I want to monitor the Stellaris board's traffic with Wireshark, I connect the hub as shown. The Stellaris board is streaming audio from the Internet.
1. When the Stellaris board is connected to the switch, it runs all day long, no problem
2. When connected to the hub, it works for a few minutes only. Stellaris is able to Tx packets (as seen by the Ethernet activity LED and Wireshark), but will not Rx. As a matter of fact, when this problem occurs, pinging Stellaris with the computer will not even cause Stellaris' Ethernet activity LED to blink! I however see the hub's LEDs flickering, and have swapped ports on the hub between Stellaris and the computer. The hub is perfectly functional. The hub is indicating a link, and Stellaris is indicating a 10Mb/s link according to PHY_MR1 and PHY_MR31.
3. After this issue occurs on the hub, everything works when I move Stellaris to the switch. I don't even have to reset Stellaris. Moving Stellaris back to the hub again causes the inability to Rx frames, until I reset Stellaris.
It's as if Stellaris gets into a state where it refuses to communicate at 10Mb/s, but is still ok with 100Mb/s. Is there anything else I can try?
The enet_lwip application has been around for many years and used by many customers in lots of applications. It may not be perfect but it is well tested and proven. Even if you have based your code on ours, subtle difference can make all the difference. Run enet_lwip on the hub.
Also look at all of the ethernet registers with a debugger to see what if any error bits are set. Hubs are less intelligent then switches therefore it is more likely that errors and collisions will occur regardless of speed. I suspect such a collision or error has occurred and your code is not properly clearing it.
another thing to try if you have control over the streaming music server would be to slow down the packet rate to the Stellaris. If a slower packet rate does not generate this error then it may be an overrun condition.
Regards
Stellaris Dexter The enet_lwip application has been around for many years and used by many customers in lots of applications. It may not be perfect but it is well tested and proven. Even if you have based your code on ours, subtle difference can make all the difference. Run enet_lwip on the hub. Also look at all of the ethernet registers with a debugger to see what if any error bits are set. Hubs are less intelligent then switches therefore it is more likely that errors and collisions will occur regardless of speed. I suspect such a collision or error has occurred and your code is not properly clearing it. another thing to try if you have control over the streaming music server would be to slow down the packet rate to the Stellaris. If a slower packet rate does not generate this error then it may be an overrun condition. Regards
I will look into trying enet_lwip, although I have to see how to transfer data to get it to occur.
I have just tried comparing the PHY and MAC registers before the issue occurs and after, and they are identical. I'm almost certain this is not a stack/software issue. If it was, how could the Stellaris part work correct when moved to the switch, without power cycling or resetting, then not work again when I plug it into the hub. This makes no sense.
And on top of that, it DOES work on the hub for a few minutes, and works all day long on the switch.
Yes, collisions only happen on hubs, but I see collisions occurring when connected to the hub for several minutes before it craps out.
The data transfer rate in my test case is only about 100 kilobits/s. It works fine at the max rate of about 350 kbit/s, at least for a few minutes on a hub.
This is C3 rev LM3S9B92. I don't recall errata pertaining to anything like this, but this HAS to be a hardware issue. The 10Mb section of the PHY fails, while the 100Mb section picks up and continues as if nothing is wrong.
An update:
I have not had time to try enet_lwip yet on my eval board, but this discovery will give me something simple to try on it.
With my computer and Stellaris board connected to the hub, and the hub also connected to my LAN, I start transferring large amounts of data between my computer and something on my LAN. So the Stellaris board is seeing all this traffic, but since promiscuous mode is off, the MAC should be blocking it.
Then, I start to ping the Stellaris board from the computer. The Stellaris board otherwise is not communicating with anything else. The pings are set up to contain 1000 bytes of data. The Stellaris board responds normally for awhile (a minute or two, or maybe less) then quits. Meanwhile, just to prove the hub is still functioning, the computer is still happily transferring data.
The LED indicates the Stellaris board has a link, but the activity LED never blinks again unless I 1)reset the board, or 2) move the Stellaris board to the switch. Even attempting to ping a non-existent IP address, which would normally cause the Stellaris activity LED to blink since my computer is sending out a broadcast ARP request, does not happen.
I have rechecked the C3 errata and find nothing related to Ethernet other than the LED settings, which I'm already aware of. This is strange. This seems to me to be a PHY ISSUE! Why else does it kill even the activity LED?
I understand this is frustrating. I will try to do some more testing here.
Please perform this latest test with the enet_lwip to confirm.
Stellaris Dexter I understand this is frustrating. I will try to do some more testing here. Please perform this latest test with the enet_lwip to confirm. Dexter
Thank you for keeping in contact. I was able to try enet_lwip on my 9B96 eval board, and so far I'm unable to kill it. However, the the chip on the board in question is a 9B92 with that factory programmed patch from 0x0 - 0x1000 (rev C3), while the 9B96 does not need this patch.
I am planning on trying enet_lwip on the 9B92 chip as soon as possible.
The factory programmed patch was only in Rev C1. We have since fully production qualified revision C5. If you are still using C1 or C3 devices you should transition as soon as possible to the production qualified C5 version. The revision is on the part in the second line. just after 80.
You are right to do the enet_lwip test on the 9B92 as well. If it fails this might point to PCB design and layout. If it passes then it points to a possible software difference.
Stellaris Dexter The factory programmed patch was only in Rev C1. We have since fully production qualified revision C5. If you are still using C1 or C3 devices you should transition as soon as possible to the production qualified C5 version. The revision is on the part in the second line. just after 80. You are right to do the enet_lwip test on the 9B92 as well. If it fails this might point to PCB design and layout. If it passes then it points to a possible software difference.
Ok this is a C1 then; I know it has the patch because I have to start programming at 0x1000 and do some other changes.
I have found the problem, but do not understand why this causes it. Awhile ago I removed the parameter ETH_CFG_TX_DPLXEN from EthernetConfigSet() because otherwise I saw so much packet loss when connected to a hub that TCP connections would break down. Even DHCP and DNS requests would fail most of the time. Incidentally, even when nothing else except what the 9B92 was communicating with on the hub, the problem would occur. Again with a switch, no problem.
This made sense to me as full duplex cannot be used on a hub. This appeared to fix the problem, but lately I have discovered (because I normally connect this thing to a switch only) that after a period of time, the 9B92 Ethernet controller gives up and refuses to Rx anything from a hub until it's reset or connected to a switch.
So the question is, how do I handle connections to hubs vs. switches? I have never found any code in the lwIP examples on how to do this. This is obviously a hardware question since the Ethernet controller must drop into half duplex. My thought was that leaving it in half duplex permanently was a easy fix, but when connected to a hub, it's not happy with either setting.
The datasheet isn't helping much either, except I found this interesting comment:
At the MAC layer, the transmitter can be configured for both full-duplex and half-duplex operationby using the DUPLEX bit in the MACTCTL register. Note that in 10BASE-T half-duplex mode, thetransmitted data is looped back on the receive path.
That last sentence is causing me some worry. Does the Ethernet controller disregard these looped back data, and why doesn't it happen at 100Mb? Or is this simply a statement of what happens in a hub and not what happens inside the Stellaris?
In this case I might be tempted to take the easy way out and just decide not to use a hub, after all they are on the verge of being obsolete. But I keep a hub around to monitor Ethernet communications on embedded projects using Wireshark.
So to summarize, when connected to a hub and set to full duplex, the Ethernet controller is stable and never gets into this bad state, but since it's talking full duplex on a hub, lots of packets are lost.
When connected to a hub and set to half duplex, very little or no packet loss is experienced, until the Ethernet controller decides not to talk anymore.
I should add that I put enet_lwip on a EK-LM3S9B92 board, with the ETH_CFG_TX_DPLXEN parameter removed, and can cause the exact same symptoms. I connect my PC, Stellaris EK board, and my LAN to the hub. I transfer large amounts of data between my PC and LAN, while pinging the EK with my PC, and within a minute or two the Stellaris Ethernet controller stops Rx; the activity LED even refuses to blink. Same exact issue I'm seeing on my custom board. I was using a ping with a longer payload, but have found that the standard Windows ping parameters will kill it too.
I have a feeling my last post is getting lost.
I have recreated the problem on a TI eval board, with slightly modified TI code. If someone is looking into this already, just let me know. I just want to make sure this isn't forgotten as I have exhausted every other opportunity to help myself.
Sorry not getting lost, I had thought you were still in process on some testing.
Help me understand. You have our code on our dev kit which works. And you have modified our code and run on the dev kit and it fails. This is narrowing down the issue to something to do with the changes you made to the software. That is, if I understand everything up to this point correctly.
1) I talked with some others about this and they reminded me that the PHY should work by default with a 10Mb/s hub using the auto-negotiate features. This should be how our code works. Since you specifically modified that auto-negotiate section of code that seems like the place to start. Start backing out change from your code versus ours and see when things start to work again.
2) Try putting all of the auto negotiate stuff back into your custom code. This is a long shot but sometimes its worth while to swing for the fences.
By auto-negotiate I specifically mean the bits in the PHY/MAC that allow it to self-determine link speed and half or full duplex.
Stellaris Dexter Sorry not getting lost, I had thought you were still in process on some testing. Help me understand. You have our code on our dev kit which works. And you have modified our code and run on the dev kit and it fails. This is narrowing down the issue to something to do with the changes you made to the software. That is, if I understand everything up to this point correctly.
That second to the last post of mine was perhaps confusing. I will summarize.
I connected the TI dev kit, my PC, and LAN to the 10Mb hub. I begin transferring files from the LAN to the PC to generate traffic. Simultaneously I sent continuous pings to the dev kit from PC.
With the TI dev kit and enet_lwip code, unchanged, it works but many pings fail. Maybe 25 - 30% of pings fail.
But, if I remove the parameter ETH_CFG_TX_DPLXEN from the EthernetConfigSet() function, very few or no pings fail. This is good. But after a minute or two the PHY decides it no longer wants to talk to the 10Mb hub. The activity light on the dev kit no longer lights, but the link is maintained, and link light stays on. Oddly enough, moving the dev kit to the 100Mb switch makes the dev kit talk again without having to reset it. Of course, resetting the dev kit fixes it too, for awhile.
Stellaris Dexter 1) I talked with some others about this and they reminded me that the PHY should work by default with a 10Mb/s hub using the auto-negotiate features. This should be how our code works. Since you specifically modified that auto-negotiate section of code that seems like the place to start. Start backing out change from your code versus ours and see when things start to work again. 2) Try putting all of the auto negotiate stuff back into your custom code. This is a long shot but sometimes its worth while to swing for the fences. By auto-negotiate I specifically mean the bits in the PHY/MAC that allow it to self-determine link speed and half or full duplex. Dexter
I did modify only what was being passed to EthernetConfigSet() as described. But in my application, packet loss is so bad with ETH_CFG_TX_DPLXEN that it doesn't work. Removing this makes things work perfectly, but only for a few minutes.
As mentioned previously, when connected to a switch, there are absolutely no problems. I only have 100Mb switches so I don't know about 10Mb on a switch (if that even exists).
I thought the test I did with the dev kit and enet_lwip is good because it's easy for someone else to replicate, and uses TI's code and hardware to demonstrate the issue.
Hi,
Anything I can try regarding this?
I think I'm just going to email tech support. I can be patient but unless I get a "hang on, we're still working on it" response, I'll assume it's been forgotten.
"Hang on, we're still working on it"
I have been personally trying to find some time to work on this and it looks like that is not going to happen. However, I did find another member of my team with some time to dig into this and take it the next step.
He'll be picking it up tomorrow so hopefully you will see some questions and feedback in the next 48 hours.
I have had a couple of other thoughts that you can maybe look into in the mean time. First, have you tried this with different 10Mbps hubs?
Also I see two potential solutions that would get you up and running. Either figure out why the non-duplex version is going off the bus and fix it or use the duplex version and figure out why 20-30% of the packets are dropped and reduce the packet drop to manageable level.
I am leaving open the possibility that this is an issue with the PHY however since that will be a very difficult investigation and hard to prove I would like to keep exploring other ways to get your system operational.