I'm working with a customer who is experiencing timeouts when trying to TFTP large files. With particularly large files, the timeouts start adding up until they reach the limit of 10, at which point the transfer fails. The customer is running U-Boot SPL 2020.01-g3c9ebdb87d. The connected host Linux machine on the other end of the transfer is SLES 15 (recently updated from HeliOS 7). They also experienced timeouts with HeliOS 7, but the incident rate was lower. We are also seeing retries of TFTP Acks and Data Pakets showing up in the Wireshark log.
I did a bit of searching and found that the issue being reported is essentially the same as this E2E post about AM64x:
I followed up privately with Schuyler to see if he ever determined the root cause or found a solution. He provided me with the following commentary:
Looking at the wireshark trace I see the redundant packet they mention. This link describes someone else having a similar issue. Looking at a couple of other similar threads the solution or suggestion is to put both the server and the device on an isolated subnet or at least the same switch. The 10.x.x.x is typically a larger network and perhaps the tftp packets are crossing bridges. Not perhaps a good solution but it would be worth finding out if the issues go away with an isolated network.
The sequence makes me think that U-boot missed a TFTP packet and resends the acknowledgement of the previous packet to trigger the next packet.
https://u-boot.denx.narkive.com/1Hq48VWQ/users-tftp-timeout
The customer has followed the advise to put the DRA821U and host machine on an isolated subnet by directly connecting the DRA821U and the host (no switch in between). The issue is still present even in this configuration.
Questions:
- Is this a known issue (possibly with a known resolution)?
- What are the next steps to help the customer resolve this issue?
Thanks,
Stuart