I'm working on an evaluation with a third party of our Ethernet/IP solution and the third party has found some reliability issues that I'm hoping to troubleshoot through this thread.
We have found that at some point in time, the switch stops forwarding packets in one direction (B->A). When this happens, the green LED that is blinking freezes on solid. That’s the visual indication that the problem occurred. The network has been pared down to a linear topology, DLR supervisor disabled and no background traffic. So there’s a minimal amount of traffic.
Here is the network diagram:
NetAnalyzer is a physical layer tap that has 4 ports on it. It captures packets received on any given port. Unlike regular network interfaces, it does not reject error frames, so any frame regardless of good/bad, is captured. It also can capture link up/down events we believe, but we are not seeing any of those in this test. The data received by NetAnalyzer is converted to Wireshark format so it can be viewed with Wireshark easily. The Wireshark log can be provided sperately via email as I was not able to successfully upload it to this E2E thread. Note that there were many files generated in this capture. The one attached is the one where the TI device stops forwarding traffic in one direction, as will be explained later.
Note how NetAnalyzer port numbers 0-3 are connected. That port number is reflected in the Wireshark capture to indicate which port the packet was received on. This is at the very beginning of the frame decode in Wireshark.
Other info about the network:
1) The TI module is the DUT (Device Under Test) of course.
2) The devices at .132 and .81 are Rockwell ControlLogix controllers with 1756-EN2TR modules.
- These controllers are used as traffic generators normally but for this test, the controllers are actually unplugged from the chassis.
 - So the EN2TR modules are on the network but not contributing traffic.
 
3) The ETAP at .83 is not connected to a Wireshark PC, it is just on the network, also not contributing any packets to the flow.
4) There is no I/O connection to the Point I/O chassis so it is also not contributing any traffic to the network.
5) The CIP Motion Controller at node .133 is typically the ring supervisor, but for this test it is disabled so there is no DLR beacon frames being sent. This is to minimize capture file size.
6) The connection to port 2 of .133 is disconnected so there is no ring.
The traffic flowing between .133 and .134 and between .133 and .135 is CIP Motion traffic. There is one packet flowing in each direction so .133->.134, .134->.133, .133->.135 and .135->.133. We believe the packets flow at a 1.5ms rate. The capture file will show each frame twice because each frame is received on two separate ports in NetAnalyzer, so you have to look at the NetAnalyzer information to see what port a given frame comes in on. It only records frames received on a port, not frames transmitted on a port. Fortunately there’s a pattern that emerges that makes it a little easier to spot when the issue occurs. Here’s the pattern normally, when things are working:
| 
 Port Received  | 
 From  | 
 To  | 
 Description  | 
| 
 3  | 
 134  | 
 133  | 
 Packet from drive received on port 3 of NetAnalyzer  | 
| 
 0  | 
 134  | 
 133  | 
 Previous packet forwarded by the TI switch  | 
| 
 3  | 
 135  | 
 133  | 
 Packet from drive received on port 3 of NetAnalyzer  | 
| 
 0  | 
 135  | 
 133  | 
 Previous packet forwarded by the TI switch  | 
| 
 1  | 
 133  | 
 135  | 
 Packet from controller received on port 1 of NA  | 
| 
 2  | 
 133  | 
 135  | 
 Previous packet forwarded by TI switch  | 
| 
 1  | 
 133  | 
 134  | 
 Packet from controller received on port 1 of NA  | 
| 
 2  | 
 133  | 
 134  | 
 Previous packet forwarded by TI switch  | 
| 
 Repeat  | 
 
  | 
||
Looking at the frames leading up to #330537, you’ll see the above pattern repeating.
At frame 330537 the pattern changes to this:
| 
 Port Received  | 
 From  | 
 To  | 
 
  | 
| 
 3  | 
 134  | 
 133  | 
 Packet from drive received on port 3 of NetAnalyzer  | 
| 
 3  | 
 135  | 
 133  | 
 Packet from drive received on port 3 of NetAnalyzer  | 
| 
 1  | 
 133  | 
 135  | 
 Packet from controller received on port 1 of NA  | 
| 
 2  | 
 133  | 
 135  | 
 Previous packet forwarded by TI switch  | 
| 
 1  | 
 133  | 
 134  | 
 Packet from controller received on port 1 of NA  | 
| 
 2  | 
 133  | 
 134  | 
 Previous packet forwarded by TI switch  | 
| 
 Repeat  | 
 
  | 
||
What this shows is that the DUT quits forwarding packets that come into Port B, but still forwards packets in the opposite direction. This means that packets coming from the two K6500 drives (.134 and .134) are not getting to the controller and eventually at frame 331293, the CIP Motion controller gives up sending its packets. This is an indication that the connection has timed out. Shortly thereafter in frame 331496 and 331498, the CIP Motion controller closes the connections with the drives. It later tries to re-open them but because it’s not getting responses from Port B to Port A thru the DUT, the connections never re-establish.
So that’s what I know at this point in time.
After some further testing, we were able to reproduce the issue on an even simpler network. We set up a CompactLogix controller (L36ERM), the TI device, and a Point I/O adapter. Here’s the diagram.
The connection between the CompactLogix and the Point IO adapter was not fully setup due to a meeting. When returning about an hour later, the TI board’s LED was solid green and no traffic will pass from Port 1 to Port 0.
The only traffic on the network when this happened was RSLinx browsing and the Point IO adapter responding, maybe some ARPs and other PC network ‘stuff’.
Please advise next steps in troubleshooting this issue.
Thanks,
Stuart
				
                          

