This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/TMS320F28388D: TCP communication fail

Part Number: TMS320F28388D

Tool/software: Code Composer Studio

I write application that run on the M4 CPU, the code is based on the project example tcpEchoF2838X.

I send and receive data ok from some PC application in a loop till suddenly ( after few hours) it stop the M4 seems to be dead ( One of the thread is running a counter which is frizzed).

Need to wake it up by device reset the device or reset it from CPU1 ( by command SysCtl_controlCMReset ).

I change the buffer size to 1024 ( original was 2048 ), is this may be the root cause of the problem ? Now I return it back to 2048 and the test is running again .....

Here is the relevant code that set the sizes (I am not sure if it is the reason for the M4 fail)

 

static void initTcp(void *hCfg)
{

int transmitBufSize = 2048;//1024;
int receiveBufSize = 2048;//1024;
int receiveBufLimit = 2048;//1024;

CfgAddEntry(hCfg, CFGTAG_IP, CFGITEM_IP_SOCKTCPTXBUF, CFG_ADDMODE_UNIQUE,
sizeof(uint32_t), (unsigned char *)&transmitBufSize, NULL);
CfgAddEntry(hCfg, CFGTAG_IP, CFGITEM_IP_SOCKTCPRXBUF, CFG_ADDMODE_UNIQUE,
sizeof(uint32_t), (unsigned char *)&receiveBufSize, NULL);
CfgAddEntry(hCfg, CFGTAG_IP, CFGITEM_IP_SOCKTCPRXLIMIT, CFG_ADDMODE_UNIQUE,
sizeof(uint32_t), (unsigned char *)&receiveBufLimit, NULL);

}

Is someone have idea what can be the problem ?

Avraham

  • Hi Avraham,

    1. can you please let me know what modifications are done on top of tcpEchoF2828X? 

    2. Does it crash after a particular traffic pattern from the Ethernet Peer, what are you running on other side? 

    3. Is it possible to sniff the traffic pattern using wireshark/similar network sniffers and see if there is a particular traffic pattern that causes this crash

    Regards,

    Sudharsanan

  • Hi Sudharsanan

    1) I add UDP socket and listen to it with recvfrom , it get message from PC application each second and response.

    Also there exist one TCP socket similar as was in the original example, with this socket PC send and received data ~100Kbyte/sec

    each socket is handled by separate thread.

    I found that if I increase the message rate with the UDP socket, it fail after few minutes, sometime M$ is completely dead, and some time it is alive ( a counter in the thread loop is counting ) but cannot  reconnect to the TCP socket anymore.

     

    Avraham

  • I make the following test:

    1) Open UDP socket and do nothing with it ( there is no any thread that call recv or recvfrom and not send data )

       server_udp = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);

       localAddr.sin_family = AF_INET;

       localAddr.sin_addr.s_addr = htonl(INADDR_ANY);

       localAddr.sin_port = htons(30718);

       status = bind(server_udp, (struct sockaddr *)&localAddr, sizeof(localAddr));

    2) Open socket server  that listen to TCP connections requests and when accept connection create thread that communicate with client, It is similar as the original example but maximum number of TCP connection is 1 instead of 3 ( #define NUMTCPWORKERS  1)

    PC application connect to the TCP and send received data in rate ~100kbye/sec

    Another PC application (IONinja) send short messages (4 byte each) to the UDP socket ( that as I say no one do recvfrom on it in the M4 side ) in rate of ~100 messages/sec.

    After few seconds TCP socket stop communicate with PC, if I close it in the server side ( by call “close” in the M4) then PC fail to reconnect to it. If close it from the client side(PC) it not seems to be close in the server side ( recv not return -1 ) .

    I am not sure if the UDP socket is the problem ( at list not the only) , anyway I did another test   in which I don’t create any UDP socket, and run a test overnight ( PC application communicate with the TCP socket send reciveed data in rate ~100k ), so far it not fail after 11 hours…..

     

    Avraham

  • It fail after ~ 12 hours, the M4 seems to be dead.

    The test is communicate with one TCP socket ( no create any UDP socket, so UDP socket probably was not the problem ).

    Now I will change SW :  when accept connection the tcpHandler thread itself will enter the  communication loop and communicate  with client ( not create separate thread for this ), When connection is close it will return to the accept function .

    Avraham

  • Hi Avraham,

    Thanks for updating about the tests. We haven't done long run/Overnight tests with that example. It was just to demonstrate usage of NDK stack with a TCP Echo example. 

    Regards,

    Sudharsanan

  • Hi Avraham,

    Overnight tests runs fine for us without any crash with the TCP Echo example.

    1. Can you please let us know if that is happening due to any of the changes that you have done on top of the TCP Echo application provided in C200Ware package? 2. As I asked earlier is it possible to capture the traffic to analyze that? 

    Regards,

    Sudharsanan

  • Hi

    I run it  with direct connection all weekend 60 hours without any crash, PC network card was connected with cable directly to tenor board (not via any switch/router/hub )

    Yesterday I run it again from noon until morning and it fail after 19 hours, the connection was via our central switch (so not direct connection ).

    So maybe the tenor get some message from the switch/world which make it fail …

    I am going to repeat the direct connection test today till Sunday .

    >>Can you please let us know if that is happening due to any of the changes that you have done on top of the TCP Echo application provided in C200Ware package?

    I am not sure, one of the test we plain to do is to run the original TCP Echo application…..

    >> As I asked earlier is it possible to capture the traffic to analyze that?

    I think we will buy a HW sniffer to monitor the network traffic from/to the tenor board while the test and see if there any suspected packet....

    Avraham

  • Hi,

    avraham levkovizh said:
    So maybe the tenor get some message from the switch/world which make it fail

    Thanks for the debug and update. 

    avraham levkovizh said:
    I am not sure, one of the test we plain to do is to run the original TCP Echo application…..

    Thanks. Please let us know how it goes?

    avraham levkovizh said:
    I think we will buy a HW sniffer to monitor the network traffic from/to the tenor board while the test and see if there any suspected packet....

    Sure..https://wiki.wireshark.org/CaptureSetup/Ethernet#Capture_using_a_monitor_mode_of_the_switch

    This link might be of use.

    Regards,

    Sudharsanan

  • Hi, 

    Is there any progress on this debug?

    Regards,

    Sudharsanan

  • Last test we did is run it 150 hours without fail.

    The PC was connected directly to the tenor board ( not via any router/switch).

    So we still suspect that when connect via our network it get some message that cause fail.

    we didn’t buy HW sniffer yet but it is in the plain.

     

    Avraham

  • Hi Avraham,

    Thanks for the update. I shall close this thread. You can reopen another thread and link this thread once you have any update with the Wireshark sniffer. 

    Regards,

    Sudharsanan