In our context, there seems to be a portion of LM3S9D96 chips (all revision C5), which from a certain point start receiving only part of ethernet telegrams (1) or no telegrams at all (2).
We are making use of enet_lwip reference software on our custom control boards based on the LM3S9D96. Especially scenario (2) we could prove with no further ethernet interrupt occuring.
For coming to reproducing the failures, we set up heavy penetration testing in form of a ping -f (3) and a cyclic TCP socket-based request / response protocol (4). By this we are coming to failures (1) or (2) within minutes or one-digit hours on bad boards. Good boards are running 2 weeks and more without failure.
To sort out whether it could be the hardware context, we took one reproducible bad and one reproducible good board and swapped main processors (i.e. the LM3S9D96). Result is that failure (1) or (2) is tied to the relocated processor. The other way round, the previously bad board is bahaving well with the processor from the previously good board.
From a software perspective, both situations (1) and (2) can be overcome by triggering Bit 15 (“Reset Registers”) in Register Ethernet PHY Management Register 0 – Control MR0, however in live context, this cannot be applied
- as for a around one second, an autonegotiation on ethernet speed takes place. No data communication can happen in this period
- tracing contents of all MAC and PHY related registers, no criterion for a hanging condition was found
I do know the Stellaris is discontinued and in the meantime, we are rolling out successor boards with Tiva, nevertheless we are having the structural problems in field as described above.
Please understand that on top of failures in field, analysis as described above cost us time and efforts.Therefore, I request you for
- comments on the phenomena described (I did not find anything related in errata sheets nor e2e)
- advise on how to overcome the failure situations