This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
I am experiencing an issue with the lwip stack 1.4.1 on the TM4C1292NCPDT chip. After a variable amount of time, generally at least eight hours, of TCP communications and responses, the TCP port abruptly stops responding to TCP packets. Using debug code and wireshark, I can trace the incoming packets through the lwip stack and into the application layer, see that a response is generated in my application layer, but the response never exits the stack, and further TCP communication is impossible. Using lwip stats, I can see that on a failed unit the TCPmemerror counter gets to some huge numbers of events, many tens of thousands. Before failure, TCPmemerror does not report any events.
UDP communication is unaffeced, and TCP communication over a different connection is also possible.
For many years, and many thousands of devices, I had used the LM3S6911 TI's lwip 1.3.0 port, without a single occurrence of this issue. It would appear that this is an issue with the TCP send mechanism in lwip 1.4.1.
This post seemed promising, but implementing the fix outlined did not solve the problem: http://e2e.ti.com/support/microcontrollers/tiva_arm/f/908/p/374100/1316428#1316428
Bravo - a very well constructed, detailed, caring posting - very well done! I could not bear to see your writing, "just sit."
Far "over my head" in matters "TCP" - yet maybe the following provides some aid.
Again - operating w/out "TCP" knowledge - might it prove useful to "completely control" both the input & output responses - limiting such to elementary (brief) transactions? With this level of control - perhaps the failure times may converge - which should "point the way" toward the discovery of the failure mechanism...
Firm/I have enjoyed much diagnostic success - even when - and (sometimes) especially when - we know NOTHING of the clients (failed) application...
Thank you, Amit. Is it not true that, "A fix unknown or not quickly/easily found" may be described as ineffective? That's a pity - so hard to justify and/or understand...
Hi Amit,
Many thanks for the answers. Here are answers to your specific questions:
TivaWare: I’m not sure the original version we were using (I will check into that), but we upgraded to 2.1.3.156 over the weekend and performed some testing. Seven or eight test units failed within 4-5 hours, and one lasted overnight, which is fairly consistent with our earlier testing results. As before, we are able to communicate over UDP or a different TCP connection to these units.
IP address: I don’t have a way to check this externally at the moment, but the fact that I am able to open up additional TCP connections, and communicate through UDP, makes me believe that the IP address is still valid in the sack.
Let me know what other testing I can do to help you narrow down this issue.
Thanks,
Mike
Hi Amit,
Unfortunately not. We've tested increasing the stack size from 2k to 10k with no measurable improvement in uptime. I verified the previous version of TivaWare we were using: 2.0.1.11577.
Thanks,
Mike
Hello Michael
Can you please send the lwipopts.h file that you have for your project which gives the error, the wireshark log when the error occurs and the Debug output from lwIP?
Hi Amit,
See attached for the lwipopts.h, and a screenshot with some lwipstats info. The two drives with 0 TCP memerrs recorded have not yet failed. Note that the TCP xmit + TCP memerr number is nearly equal to the TCP recv number for all the failed drives.
The App xmit and App recv numbers are the number of TCP packets received and transmitted by our application layer before freezing - note the variation from hundreds of thousands to over 1.7 million packets.
We don't have a JTAG port on this device, so we have to get all our debug info out through a UDP response from pre-coded routines in the units. We can add more information to that packet if needed.
I don't have a wireshark trace at hand, but I'll get one and post it.
Thanks,
Mike