Sitara system (more specifically the NDK) on some systems produces many "Retransmit Timeout Errors"

cemalettin aykan

Intellectual 260 points

Other Parts Discussed in Thread: AM3359

System setup:

Client: PC (x86) system - Windows based (QtTCPServerSocket)

Server: TI Sitara AM3359 (including SYS/BIOS 6.41.0.26, NDK 2.24.1.18, Compiler TI v5.2.2)

CCS: Version 6.1

Problem:

Client sends (cyclically) a TCP packet to Server. The Server acknowledges it with an explicit ACK message. But, if the Server sends a TCP packet to Client, the Client does only acknowledge some TCP packets with an explicit ACK message! The other TCP packets seem to be acknowledged with an implicit ACK (assumeably according to RCF standands).

On the CCS output, you can see the printf's with "TcpTimeoutRexmt: Retransmit Timeout" cyclically (400ms).

In the attachment, please find the WireShark-Capture for good case and bad case.

Additionally, you can find in attachment the CCS with console output with tcptime.c from NDK.

2625.SnapShots.pdf

over 8 years ago

0 David Friedland over 8 years ago

TI__Mastermind 18320 points

Cemalettin,

Unfortunately, the person most familiar with the NDK is out on paternity leave for the rest of the year. I will see if I can find anyone else who can answer. Digging into the few past forum threads regarding retransmit timeout errors (e.g., e2e.ti.com/.../393296) all seem to point to a stack size setting that is too small. So you should make sure that this is not the case for your application.

0 cemalettin aykan over 8 years ago in reply to David Friedland

Intellectual 260 points

Hi,

what do you mean with stack size? Stack size at the client side (PC side) or at server side (Sitara)?

Thanks.

0 janet over 8 years ago in reply to cemalettin aykan

TI__Mastermind 22760 points

Hi Cemalattin,
I don't think the stack size is your problem. Maybe this post is more relevant?

e2e.ti.com/.../238306

Best regards,
Janet

0 cemalettin aykan over 8 years ago in reply to janet

Intellectual 260 points

Hi Janet,

I read the link you sent me above and I tried some experiments with TCP transmit buffer size which is proposed in the link.
Unfortunately, the problem is still present!

Do you need some more additional info to solve the problem?

0 janet over 8 years ago in reply to cemalettin aykan

TI__Mastermind 22760 points

Hi Cemalettin,

I asked our NDK expert about this and he thinks this could be normal behavior. On a new connection, TCP does not know how long it will take to get an ACK from the other side of the connection. It waits some default time for an ACK, and if the ACK is not received within that time, it retransmits the data, increasing the retransmit time. This is how TCP auto adjusts itself to adapt to good or bad connections between two hosts, high network load, or to adjust to slow response time due to a high load on a server, for example.

Looking at a couple of the retransmit scenarios in the “bad case” from the Wireshark screen shot, I’ll use the last digits of the IP addresses (“10” and “2”) to identify which network host is which.

10 (PC?) retransmits to 2 (embedded device?)

Packet 4:
1. 10 sends data to 2
2. Time: 7.7624 seconds
3. Packet 5:
  1. Host 10 retransmits packet 4, because it did not receive an ACK for packet #4 from host 2
  2. Time: 8.0176 (~255 ms later)
  3. Host 10 waited around 250ms for an ACK from host 2. It never received it, so it retransmitted here.
  4. Sounds like at this point host 10’s retransmit timer is set to 250 ms, and so waited that much time before deciding to retransmit 255ms later
  5. Packet 6 and 7
    1. 2 ACKs packets 4 and 5, noting that the ACK of packet #5 is a duplicate of the ACK sent in #6
    2. Note the times here:

i. Packet 6: Time: 8.2658 seconds

ii. This is ~500 ms AFTER host 10 sent the data in packet #4!

iii. Packet 7: Time is immediately after packet 6 (it just sends the two ACKs back to back)

So, these seem normal. TCP is just adjusting to the network and/or the host response times.

The only thing that seems extreme is the amount of time it took host 2 to respond (packet #6) to packet #4 (~500ms).

That seems like a long time. So maybe this was happening at a time when the network was under high load. If not, then maybe host 2 itself had a high load. If host 2 is the target, then what was going on at this time? Was it doing a lot of work in other Tasks that had higher priority than the NDK? If so, that would be a possible reason, and the NDK sent the ACK as soon as it was given processor time.

The rest of the retransmits shown are the opposite – host 2 is retransmitting data because host 10 doesn’t ACK in time, so this “issue” goes both ways.

Here, it looks like the retransmit threshold is around 200 ms, as the retransmits were sent at around 205 ms or so, in one of the cases. In this opposite scenario, we could ask the same. What’s happening on the network when this happens? Or, is host 10 under high cpu load at that time, resulting in delayed ACK?

Best regards,

Janet

0 cemalettin aykan over 8 years ago in reply to janet

Intellectual 260 points

Hi Janet,

thank you for the quick response!

The main problem I have is at the target side (IP: 2), i.e.TI Sitara AM3359 (including NDK 2.24.1.18).

At the target side, we have no high cpu load.

As you can see from snapshot below, the TCP Retransmission from target occurs approximately every 500ms. It cost performance at the target side!

The retransmit threshold is around 150 ms.

Due to performance reason, we have changed the retransmit threshold in NDK as follows:

#define TCPTV_RTXMIN 10 ( /* minimum value retransmit timer */

(old value: 2)

#define TCPTV_MINIMALMAXRTT 10 /* maximum allowed RTT default (.1 sec) */

(old value: 2)

What is your opinion for these changes above?

Do you have any alternative proposal?

Thanks.

Kind regards,

Cemal

0 janet over 8 years ago in reply to cemalettin aykan

TI__Mastermind 22760 points

Hi Cemal,
I'm not sure how those #defines are used. Did changing them have an affect?
What is your network topology?
Thanks,
Janet

0 cemalettin aykan over 8 years ago in reply to janet

Intellectual 260 points

Hi Janet,

the affect is that no retransmission timeout errors occurs anymore!

My network topology is as follows:

PC (client: IP:10) --------<connection via ethernet>------------> Target (sitara: IP:2)

PC requests any data (read request) from Target cyclically.

Kind Regards,

Cemal

Processors

Processors forum

Sitara system (more specifically the NDK) on some systems produces many "Retransmit Timeout Errors"