This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

CCS/TMDSICE3359: EtherNet/IP PBM allocation issue

Part Number: TMDSICE3359

Tool/software: Code Composer Studio

Hi

I was testing the TMDSICE3359 with Ethernet/IP with a robustness tool and ran into a problem with PBM_allocation. After establishing and releasing 17 TCP connections to port 44818 simultaneously it's not possible to receive any packets. The PBM_allocation in NIMU_ICSS_rxPacket() fails and memory_squeeze_error is incremented all the time.

I'm glad for any help I can get

Regards

Stephan

  • Stephan,

    Have you tried to increase the PBM memory using XGCONF to open .cfg file:

    Regards,

    Garrett

  • Hi Garrett,

    sorry to say this but I don't think this is a good solution. It would be better if the stack would release unused buffer.

    Regards,

    Stephan

  • Hi Stephan,

    If you are using the PRU-ICSS-ETHERNETIP-ADAPTER  01_00_03_04 release, please try to apply the patch that should fix the buffer allocation failure issue you are observing.

    3678.TCP_and_DHCP_failure_fix.patch

    Regards,

    Garrett

  • Hi Garett,

    sorry it didn't help. But I saw that there is missing something in the diff:

    third_party/protocols/ptp/ptpd/dep/net.c #line1173

      /* Setup fd_set structure for select function */

    -  FD_ZERO(readfds);
    + NDK_FD_ZERO(readfds);

    Regards,

    Stephan

  • Hi Stephan,

    Is the issue still pending? I was sidetracked for a few days...

    Regards,

    Garrett

  • Hi Garrett,

    yes the issue is still pending as the patch didn't help.

    I will try to provide you a nice trace for clarification. But I have to write a small program for that.

    Regards,

    Stefan

  • Hi Garret,

    I prepared two traces for you.

    In the good trace I open 17 TCP connections to port 44818 and close them. Everything is fine afterward

    In the bad trace I try to open 18 TCP connections to port 44818, but after the 17th connection the connection pool is exhausted and a timeout occure. Afterwards it is not possible to communicate with the device on any protocol.

    I provide you the tool with source to open TCP connections in a separate archive.

    Regards,

    Stefan5187.traces.ziptraces_tool.zip

  • Hi Stefan,

    I could reproduce the issue with your syn tool, and tried updating TCP buffer but didn't help, will work with our team to sort it out....

    Regards,

    Garrett

  • Stefan,

    After looking into your tcp.c file, I noticed the issue actually occurs when you try simultaneously connecting more than 17 sockets instead of creating then closing the socket for 17 times.

    >>"It would be better if the stack would release unused buffer" 

    Which unused buffer you were referring to?

    The "TI-RTOS Networking Stack Memory Usage" info should help 

    Regards,
    Garrett

  • Stefan,

    Not sure how many sockets you need open in total, but here is the options to update the maxim connections.

    Regards,

    Garrett

  • Hi Garrett,

    sorry for the late response and thank you for the proposed solution. The problem is the device will be tested with a robustness tool which will open the maximum amount of connections. So it wouldn't make a difference if the device would have 16 or 256 available sockets.

    Could you ask the development why this is happening?

    Thank you in advance

    Regards

    Stefan

  • Stefan,

    If looking into the function SlNetSock_AllocVirtualSocket() in ns_2_40_0x_0y\source\ti\net\slnetsock.c:

    //*****************************************************************************
    //
    // SlNetSock_AllocVirtualSocket - Search for free space in the VirtualSockets
    // array and allocate a socket in this location
    //
    /*****************************************************************************

    static int32_t SlNetSock_AllocVirtualSocket(int16_t *virtualSdIndex, SlNetSock_VirtualSocket_t **newSocketNode)

    /* Search for free space in the VirtualSockets array */
    while ( arrayIndex < SLNETSOCK_MAX_CONCURRENT_SOCKETS )
    {
    /* Check if the arrayIndex in VirtualSockets is free */
    if ( NULL == VirtualSockets[arrayIndex] )
    {
    /* Allocate memory for new socket node for the socket list */
    *newSocketNode = (SlNetSock_VirtualSocket_t *)calloc(1, sizeof(SlNetSock_VirtualSocket_t));

    #define SLNETSOCK_MAX_CONCURRENT_SOCKETS (32) /**< Declares the maximum sockets that can be opened */

    This should explain where the max socket number is from.

    Regards,
    Garrett

  • Hi Garrett,

    thank you but I don't think that the TI Network Services are used in PRU-ICSS as it uses the NDK with its BSD API.

    Nonetheless I'm more interested to get this behaviour fixed as the device won't be approved by our customers quality management.

    It's enough for me to see where it's happening as I will deny the connections a little bit earlier. --> This behaviour shouldn't occure anymore

    Any chance to get this done?

    Regards,

    Stefan

  • Stefan,

    You are right - the Network Service is not used in the Ethernet/IP application. I have reached out our NDK team to check why updating TI_NDK_SOCKET_MAX_FD in global configuration doesn't increase the maximum number of sockets accepted, also to see if it's possible to get an early indication of denying connection from NDK to API.

    Regards,

    Garrett

  • Hi Garrett,

    there is another problem regarding your patch from  Dec 5, 2019 2:42 PM in this thread:

    • It produces some problems with ACD
    • if you activate ACD and unplug/plug the only ethernet cable the device will freeze

    Could you take a look at this?

    Thanky you and regards,

    Stefan

  • Stefan,

    I am still trying to resolve the socket connection issue...

    Does the ACD issue occur only after the patch is applied i.e there is no code freeze issue without the patch?

     Regards,

    Garrett

  • Hi Garrett,

    "Does the ACD issue occur only after the patch is applied i.e there is no code freeze issue without the patch?"

    Yes this is the case

    Regards,

    Stefan

  • Stefan,

    We are able to open connect more sockets after increasing the parameter of maxcon - maximum number of connects to queue in NDK_listen() called by the EIP stack which is set to 1 in the release: int NDK_listen(SOCKET s, int maxcon);

    In terms of early indication in order to deny the connections, per the discussion with NDK team, it will be difficult to implement as the maximum number of connects is limited by memory size.

    I will be look into the ACD issue...

    Regards,

    Garrett

  • Hi Garrett,

    my issue is not the amount of connections which can be opened. 6 would be enough...

    My issue is that there will a test with this device and it have to survive the maximum possible amount of connections. There will be a machine which can open up to 2^32 connections (this number is fictional).

    Regards,

    Stefan

  • Hi Garrett,

    could you give me a status update?

    Thanky you and regards,

    Stefan

  • Stefan,

    We were able to resolve pbm_alloc() issue by increasing the pbm count, and also get more sockets connected by changing the argument of maxcon (maximum number of tcp connections to queue) in listen() called in EIP stack. However accept() doesn't return with INVALID_SOCKET nor set errono (fdError in the ndk) on AM335x when running out of sockets, which can be used by application to handle the situation of reaching the maximum connection and survive from future connect(). We are actively looking into it and suspect the root cause is in the EIP stack as the symptom doesn't occur on another platform (msp432e4) with a test application.

    With regard to ACD issue, it's reproducible in my setup as well and I have requested our development team to prioritize it for a solution.

    Thanks,
    Garrett

  • Garrett,

    Do you have any updates regarding this thread?

    Thanks,
    Brad

  • Stefan B said:

    my issue is not the amount of connections which can be opened. 6 would be enough...

    My issue is that there will a test with this device and it have to survive the maximum possible amount of connections. There will be a machine which can open up to 2^32 connections (this number is fictional).

    Stefan,

    I'm not following your statements.  You mention 6 connections is enough, but then you say you might open up to 2^32 connections.  Which is it?  I think there's a distinction here that I'm not following.

    Is this issue still open, or is the ACD issue the only thing you're trying to resolve at this point?

    Thanks,
    Brad

  • Hi Brad,

    sorry maybe I wasn't clear enough:

    • I'm able to get the amount of connections I want
    • But we have a robustness test which will open TCP connections to all open ports till they are exhausted. Then all the connections will be closed.
    • --> The device has to survive this and be able to communicate normally afterwards. This is unfortunately not the case

    The ACD issue is an independent issue which was created when a fix for the DHCP client was implmented. ACD was working fine before the DHCP fix. The patch for this fix can be found in this thread.

    Thank you and regards,

    Stefan

  • Stefan,

    Thanks for the info.  I apologize for the delay.  Garrett is traveling this week, but we've pulled in some other developers to help out too.  It looks like we have a patch for the ACD issue.  I hope to follow-up with that later today.  We still need more time on the issue with number of connections.  We have other developers that are going to reproduce the issue and debug further.

    Best regards,
    Brad

  • Stefan,

    I updated one of your earlier threads with an updated version of the patch.  I see that same patch applies here too.  We believe it will fix the ACD issue that was introduced.  Please apply the patch that I posted here:

    https://e2e.ti.com/support/processors/f/791/p/836115/3257031#3257031

    Let us know if that resolves the ACD issue.  We are still working on the other issue and should have further updates next week.

    Best regards,
    Brad

  • Stefan,

    Can you please describe how the maximum number of sockets was designed/tested with previous industrial SDK? Does it expect accept() returns INVALID_SOCKET from EIP then send CIP message to notify the robustness tool stop trying to connect? As currently the number of sockets (6) is sufficient for your application, can the EIP application just count the opened sockets and if reaching to the maximum (6) then to deny any further connection?

    Regards,

    Garrett

  • Hi Garett,

    I'm sorry for responding so late but I had to examine the issues:

    • The ACD fix worked well. But there is still a timing issue when the IP gets assigned in the NDK and a directed ARP is received is this moment. In this case the ARP will be ignored.
    • Much more interessting is the socket issue. I tried to copy the TCP implementation from Molex on another port in a much more simple way. On this other port the IP stack seems to be more robust. I can kill the port with many connections but I can't kill the IP stack like the Molex implemenation does it. So I think there is a problem in the Molex implementation. This must have changed over time as it didn't existed in V3.3.2

    I don't like the approach of "counting connections". Over time there will be a miscounting when connections are reseted in bad situations or something like that. This would only lead to many problems in the field on customer side.

    Now I will take a look at the changes of the Molex stack. Recently there was a bugfix for EPIC_PoolServerClose() and I think the problem has to do with it (e2e.ti.com/.../3131551)

    Thank you for your efforts. But it would be very interesting to know why the NDK ressources vanish with the Molex stack. Is it possible for you to localize this problem?

    Best regards,

    Stefan

  • Hi Stefan,

    Understood. The ARP timing issue is a separate one and unrelated to ACD fix, correct? Would you please elaborate how this can be reproduced at our end.

    With regard to socket issue, TMG has been helping look into the issue and plan to add a check in the function SoTcpAccept in stack to check the number of sockets opened in EIP and the value of NDK_FD_SETSIZE from NDK. The value of NDK_FD_SETSIZE is defined to 16 in NDK socketndk.h which is not changeable, but the number of sockets supported by the Ethernet/IP stack EIP_NB_SOCKET_MAX is defined in adt_config.h or user_adt_config.h. These two defines currently are uncorrelated which result in the behavior you are observing.

    Regards,
    Garrett

  • Hi Garrett,

    if you want to reproduce the ACD issue, you just have to download the "Address Conflict Detection Test Tool" by HMS from ODVA.org. Then you have to start test 8.5.15. It's pretty easy and fast to do

    Do you know how the number of sockets have to be related to work properly? It's no problem for me to increase them and to compile the NDK stack.

    Thank you and best regards,

    Stefan

  • Stefan,

    Thanks, will do.

    Rebuilding NDK is not enough as there are several NDK_FD_SET() has been inline in EIP and ptpd stacks, also a direct use of NDK_FD_SETSIZE in the EIP stack library. The entire EIP and ptp libraries has to be rebuilt after NDK_FD_SETSIZE update. The default number 16 is OK for maximum 16 sockets, but the stack need to be updated to check the number of sockets opened and the value of NDK_FD_SETSIZE.

    Regards,

    Garrett

  • Stefan,

    A bug fix in NDK v3. 50.00+ may need as well to address the issue after SoTcpAccept() is updated:

    https://sir.ext.ti.com/jira/browse/EXT_EP-9040

    NDK_socket/accept needs to honor config parameter maxSockFileDesc.

    In order to do this NDK_accept needs to be notified if there was an error creating a new accepted socket. This would also fix an issue where NDK_accept does not return NDK_ENOMEM when a socket could not be created because the NDK ran out of memory.

    src/ti/ndk/stack/sock/sockpcb.c:



     

    Regards,

    Garrett

  • Hi Garrett,

    I've implemented the fix and set down the number of TCP sockets in the EthernetIP stack.

    --> the communication is still alive after the TCP socket exhaustion

    Thank you for that and best regards,
    Stefan

  • Stefan B said:

    I've implemented the fix and set down the number of TCP sockets in the EthernetIP stack.

    --> the communication is still alive after the TCP socket exhaustion

    So did this resolve your issue related to the number of TCP sockets?

    Is there an issue related to ARP as well?  Was that caused by something we did in this thread?  If not, please start a new thread for that issue.  (You can link to it if you like, but let's not debug here if it's not directly related.)

    Please comment on whether this thread is resolved at this time.  I think a summary from your side would be good, because I'm unclear as to what issues are still open.  Thanks!

  • Hi Brad,

    I'll try to give a small summary:

    • TCP-Socket-Exhaustion lead to PBM allocation issues
      --> fixed with adaption of sockets in EthernetIP stack and NDK stack
      --> NDK regards backlog
    • DHCP-Fix lead to problems with ACD
      --> DHCP-Fix fixed in another thread
    • further ACD problems
      --> didn't really existed. Release firmware with better optimization fixed timing issues

    I would say everything in this thread is fixed

    Thank you to all involved and best regards
    Stefan