As a follow-up to several related issues running the PRU-ICSSG networks on custom board with a third-party OS+custom driver, we've run into yet another issue.
Our driver has been ported to support the more recent versions of the TI-provided PRU firmware, however we're experiencing major stability issues running FW version 02.02.11.02 - this FW seems to cause the dreaded "RX path stall" issue we've discussed in several other threads on this forum (E.G. https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1175207/am6548-pru-icssg-sr-2-0-receive-stalls) more or less all the time.
Interestingly, we're seeing much better results running FW 02.02.11.01. The RX stall issue has not yet been observed on this version (though we haven't done any testing at scale just yet). However, we've run into a new problem:
We have a testcase where we send UDP packets between two PRU-ICSSG ethernet interfaces, in this case PRU-ICSSG1.slice0 and PRU-ICSSG2.slice0 (aka emac2 and emac4). In a handful of cases, we observe lost packets (roughly 4-5 out of 1000 back-and-forth transactions). Inspecting the traffic via WireShark reveals that the lost packets indeed appear on the network, however the first 50-60% of the frame is gone. This includes the ethernet header, UDP header and so on (we have a sequence number at the very end of the data section, allowing us to confirm that the corrupted packet is the one we're missing, from a software perspective).
Some additional information:
- This only occurs on gigabit speed - we have not seen any such issues running at 100mb/s
- We have been able to locate the TX frame data in the MSMC RAM - at this stage, the corrupted frame seems complete and OK, even for the frame that later shows up corrupted on the network.
- This suggests that the corruption happens further down the stack (PRU FW, HW, MAC->PHY)
- Our interpretation is that this rules out our ethernet driver, as well as the UDMA and cache coherency issues
- This suggests that the corruption happens further down the stack (PRU FW, HW, MAC->PHY)
- The expected message size is 1514 bytes, ethernet/UDP headers included - the corrupted packets seem to be in the high-500 to high-600 byte range.
- The amount of missing data seems to be a multiple of 32 bytes
- The corrupted frames are always truncated from the front (fruncated?) - the last part of the frame always appear correct in WireShark.
Hopefully, this is a known issue with a known solution - if not, how do we proceed from here?
I'm happy to provide network traffic logs, memory dumps or any other debug info that might be required.
In the meantime, I'll do another attempt at stepping up to PRU-ICSSG FW 02.02.11.02.
/Daniel