CCS/TMDSICE3359: EtherNet/IP PBM allocation issue

Stefan B

Part Number: TMDSICE3359

Tool/software: Code Composer Studio

I was testing the TMDSICE3359 with Ethernet/IP with a robustness tool and ran into a problem with PBM_allocation. After establishing and releasing 17 TCP connections to port 44818 simultaneously it's not possible to receive any packets. The PBM_allocation in NIMU_ICSS_rxPacket() fails and memory_squeeze_error is incremented all the time.

I'm glad for any help I can get

Regards

Stephan

over 4 years ago

0 Garrett Ding over 4 years ago

TI__Mastermind 43296 points

Stephan,

Have you tried to increase the PBM memory using XGCONF to open .cfg file:

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

sorry to say this but I don't think this is a good solution. It would be better if the stack would release unused buffer.

Regards,

Stephan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Hi Stephan,

If you are using the PRU-ICSS-ETHERNETIP-ADAPTER 01_00_03_04 release, please try to apply the patch that should fix the buffer allocation failure issue you are observing.

3678.TCP_and_DHCP_failure_fix.patch

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garett,

sorry it didn't help. But I saw that there is missing something in the diff:

third_party/protocols/ptp/ptpd/dep/net.c #line1173

/* Setup fd_set structure for select function */

- FD_ZERO(readfds);
+ NDK_FD_ZERO(readfds);

Regards,

Stephan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Hi Stephan,

Is the issue still pending? I was sidetracked for a few days...

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

yes the issue is still pending as the patch didn't help.

I will try to provide you a nice trace for clarification. But I have to write a small program for that.

Regards,

Stefan

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garret,

I prepared two traces for you.

In the good trace I open 17 TCP connections to port 44818 and close them. Everything is fine afterward

In the bad trace I try to open 18 TCP connections to port 44818, but after the 17th connection the connection pool is exhausted and a timeout occure. Afterwards it is not possible to communicate with the device on any protocol.

I provide you the tool with source to open TCP connections in a separate archive.

Regards,

Stefan5187.traces.zip traces_tool.zip

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Hi Stefan,

I could reproduce the issue with your syn tool, and tried updating TCP buffer but didn't help, will work with our team to sort it out....

Regards,

Garrett

0 Garrett Ding over 4 years ago in reply to Garrett Ding

TI__Mastermind 43296 points

Stefan,

After looking into your tcp.c file, I noticed the issue actually occurs when you try simultaneously connecting more than 17 sockets instead of creating then closing the socket for 17 times.

>>"It would be better if the stack would release unused buffer"

Which unused buffer you were referring to?

The "TI-RTOS Networking Stack Memory Usage" info should help

Regards,
Garrett

0 Garrett Ding over 4 years ago in reply to Garrett Ding

TI__Mastermind 43296 points

Stefan,

Not sure how many sockets you need open in total, but here is the options to update the maxim connections.

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

sorry for the late response and thank you for the proposed solution. The problem is the device will be tested with a robustness tool which will open the maximum amount of connections. So it wouldn't make a difference if the device would have 16 or 256 available sockets.

Could you ask the development why this is happening?

Thank you in advance

Regards

Stefan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Stefan,

If looking into the function SlNetSock_AllocVirtualSocket() in ns_2_40_0x_0y\source\ti\net\slnetsock.c:

//*****************************************************************************
//
// SlNetSock_AllocVirtualSocket - Search for free space in the VirtualSockets
// array and allocate a socket in this location
//
/*****************************************************************************

static int32_t SlNetSock_AllocVirtualSocket(int16_t *virtualSdIndex, SlNetSock_VirtualSocket_t **newSocketNode)

/* Search for free space in the VirtualSockets array */
while ( arrayIndex < SLNETSOCK_MAX_CONCURRENT_SOCKETS )
{
/* Check if the arrayIndex in VirtualSockets is free */
if ( NULL == VirtualSockets[arrayIndex] )
{
/* Allocate memory for new socket node for the socket list */
*newSocketNode = (SlNetSock_VirtualSocket_t *)calloc(1, sizeof(SlNetSock_VirtualSocket_t));

#define SLNETSOCK_MAX_CONCURRENT_SOCKETS (32) /**< Declares the maximum sockets that can be opened */

This should explain where the max socket number is from.

Regards,
Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

thank you but I don't think that the TI Network Services are used in PRU-ICSS as it uses the NDK with its BSD API.

Nonetheless I'm more interested to get this behaviour fixed as the device won't be approved by our customers quality management.

It's enough for me to see where it's happening as I will deny the connections a little bit earlier. --> This behaviour shouldn't occure anymore

Any chance to get this done?

Regards,

Stefan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Stefan,

You are right - the Network Service is not used in the Ethernet/IP application. I have reached out our NDK team to check why updating TI_NDK_SOCKET_MAX_FD in global configuration doesn't increase the maximum number of sockets accepted, also to see if it's possible to get an early indication of denying connection from NDK to API.

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

there is another problem regarding your patch from Dec 5, 2019 2:42 PM in this thread:

It produces some problems with ACD
if you activate ACD and unplug/plug the only ethernet cable the device will freeze

Could you take a look at this?

Thanky you and regards,

Stefan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Stefan,

I am still trying to resolve the socket connection issue...

Does the ACD issue occur only after the patch is applied i.e there is no code freeze issue without the patch?

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

"Does the ACD issue occur only after the patch is applied i.e there is no code freeze issue without the patch?"

Yes this is the case

Regards,

Stefan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Stefan,

We are able to open connect more sockets after increasing the parameter of maxcon - maximum number of connects to queue in NDK_listen() called by the EIP stack which is set to 1 in the release: int NDK_listen(SOCKET s, int maxcon);

In terms of early indication in order to deny the connections, per the discussion with NDK team, it will be difficult to implement as the maximum number of connects is limited by memory size.

I will be look into the ACD issue...

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

my issue is not the amount of connections which can be opened. 6 would be enough...

My issue is that there will a test with this device and it have to survive the maximum possible amount of connections. There will be a machine which can open up to 2^32 connections (this number is fictional).

Regards,

Stefan

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

could you give me a status update?

Thanky you and regards,

Stefan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Stefan,

We were able to resolve pbm_alloc() issue by increasing the pbm count, and also get more sockets connected by changing the argument of maxcon (maximum number of tcp connections to queue) in listen() called in EIP stack. However accept() doesn't return with INVALID_SOCKET nor set errono (fdError in the ndk) on AM335x when running out of sockets, which can be used by application to handle the situation of reaching the maximum connection and survive from future connect(). We are actively looking into it and suspect the root cause is in the EIP stack as the symptom doesn't occur on another platform (msp432e4) with a test application.

With regard to ACD issue, it's reproducible in my setup as well and I have requested our development team to prioritize it for a solution.

Thanks,
Garrett

0 Brad Griffis over 4 years ago in reply to Garrett Ding

TI__Guru*** 125430 points

Garrett,

Do you have any updates regarding this thread?

Thanks,
Brad

0 Brad Griffis over 4 years ago in reply to Stefan B

TI__Guru*** 125430 points

Stefan B said:

my issue is not the amount of connections which can be opened. 6 would be enough...

My issue is that there will a test with this device and it have to survive the maximum possible amount of connections. There will be a machine which can open up to 2^32 connections (this number is fictional).

Stefan,

I'm not following your statements. You mention 6 connections is enough, but then you say you might open up to 2^32 connections. Which is it? I think there's a distinction here that I'm not following.

Is this issue still open, or is the ACD issue the only thing you're trying to resolve at this point?

Thanks,
Brad

0 Stefan B over 4 years ago in reply to Brad Griffis

Intellectual 745 points

Hi Brad,

sorry maybe I wasn't clear enough:

I'm able to get the amount of connections I want
But we have a robustness test which will open TCP connections to all open ports till they are exhausted. Then all the connections will be closed.
--> The device has to survive this and be able to communicate normally afterwards. This is unfortunately not the case

The ACD issue is an independent issue which was created when a fix for the DHCP client was implmented. ACD was working fine before the DHCP fix. The patch for this fix can be found in this thread.

Thank you and regards,

Stefan

0 Brad Griffis over 4 years ago in reply to Stefan B

TI__Guru*** 125430 points

Stefan,

Thanks for the info. I apologize for the delay. Garrett is traveling this week, but we've pulled in some other developers to help out too. It looks like we have a patch for the ACD issue. I hope to follow-up with that later today. We still need more time on the issue with number of connections. We have other developers that are going to reproduce the issue and debug further.

Best regards,
Brad

0 Brad Griffis over 4 years ago in reply to Brad Griffis

TI__Guru*** 125430 points

Stefan,

I updated one of your earlier threads with an updated version of the patch. I see that same patch applies here too. We believe it will fix the ACD issue that was introduced. Please apply the patch that I posted here:

https://e2e.ti.com/support/processors/f/791/p/836115/3257031#3257031

Let us know if that resolves the ACD issue. We are still working on the other issue and should have further updates next week.

Best regards,
Brad

0 Garrett Ding over 4 years ago in reply to Brad Griffis

TI__Mastermind 43296 points

Stefan,

Can you please describe how the maximum number of sockets was designed/tested with previous industrial SDK? Does it expect accept() returns INVALID_SOCKET from EIP then send CIP message to notify the robustness tool stop trying to connect? As currently the number of sockets (6) is sufficient for your application, can the EIP application just count the opened sockets and if reaching to the maximum (6) then to deny any further connection?

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garett,

I'm sorry for responding so late but I had to examine the issues:

The ACD fix worked well. But there is still a timing issue when the IP gets assigned in the NDK and a directed ARP is received is this moment. In this case the ARP will be ignored.
Much more interessting is the socket issue. I tried to copy the TCP implementation from Molex on another port in a much more simple way. On this other port the IP stack seems to be more robust. I can kill the port with many connections but I can't kill the IP stack like the Molex implemenation does it. So I think there is a problem in the Molex implementation. This must have changed over time as it didn't existed in V3.3.2

I don't like the approach of "counting connections". Over time there will be a miscounting when connections are reseted in bad situations or something like that. This would only lead to many problems in the field on customer side.

Now I will take a look at the changes of the Molex stack. Recently there was a bugfix for EPIC_PoolServerClose() and I think the problem has to do with it (e2e.ti.com/.../3131551)

Thank you for your efforts. But it would be very interesting to know why the NDK ressources vanish with the Molex stack. Is it possible for you to localize this problem?

Best regards,

Stefan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Hi Stefan,

Understood. The ARP timing issue is a separate one and unrelated to ACD fix, correct? Would you please elaborate how this can be reproduced at our end.

With regard to socket issue, TMG has been helping look into the issue and plan to add a check in the function SoTcpAccept in stack to check the number of sockets opened in EIP and the value of NDK_FD_SETSIZE from NDK. The value of NDK_FD_SETSIZE is defined to 16 in NDK socketndk.h which is not changeable, but the number of sockets supported by the Ethernet/IP stack EIP_NB_SOCKET_MAX is defined in adt_config.h or user_adt_config.h. These two defines currently are uncorrelated which result in the behavior you are observing.

Regards,
Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

if you want to reproduce the ACD issue, you just have to download the "Address Conflict Detection Test Tool" by HMS from ODVA.org. Then you have to start test 8.5.15. It's pretty easy and fast to do

Do you know how the number of sockets have to be related to work properly? It's no problem for me to increase them and to compile the NDK stack.

Thank you and best regards,

Stefan

0 Garrett Ding over 4 years ago in reply to Stefan B

TI__Mastermind 43296 points

Stefan,

Thanks, will do.

Rebuilding NDK is not enough as there are several NDK_FD_SET() has been inline in EIP and ptpd stacks, also a direct use of NDK_FD_SETSIZE in the EIP stack library. The entire EIP and ptp libraries has to be rebuilt after NDK_FD_SETSIZE update. The default number 16 is OK for maximum 16 sockets, but the stack need to be updated to check the number of sockets opened and the value of NDK_FD_SETSIZE.

Regards,

Garrett

0 Garrett Ding over 4 years ago in reply to Garrett Ding

TI__Mastermind 43296 points

Stefan,

A bug fix in NDK v3. 50.00+ may need as well to address the issue after SoTcpAccept() is updated:

https://sir.ext.ti.com/jira/browse/EXT_EP-9040

NDK_socket/accept needs to honor config parameter maxSockFileDesc.

In order to do this NDK_accept needs to be notified if there was an error creating a new accepted socket. This would also fix an issue where NDK_accept does not return NDK_ENOMEM when a socket could not be created because the NDK ran out of memory.

src/ti/ndk/stack/sock/sockpcb.c:

Regards,

Garrett

0 Stefan B over 4 years ago in reply to Garrett Ding

Intellectual 745 points

Hi Garrett,

I've implemented the fix and set down the number of TCP sockets in the EthernetIP stack.

--> the communication is still alive after the TCP socket exhaustion

Thank you for that and best regards,
Stefan

0 Brad Griffis over 4 years ago in reply to Stefan B

TI__Guru*** 125430 points

Stefan B said:

I've implemented the fix and set down the number of TCP sockets in the EthernetIP stack.

--> the communication is still alive after the TCP socket exhaustion

So did this resolve your issue related to the number of TCP sockets?

Is there an issue related to ARP as well? Was that caused by something we did in this thread? If not, please start a new thread for that issue. (You can link to it if you like, but let's not debug here if it's not directly related.)

Please comment on whether this thread is resolved at this time. I think a summary from your side would be good, because I'm unclear as to what issues are still open. Thanks!

0 Stefan B over 4 years ago in reply to Brad Griffis

Intellectual 745 points

Hi Brad,

I'll try to give a small summary:

TCP-Socket-Exhaustion lead to PBM allocation issues
--> fixed with adaption of sockets in EthernetIP stack and NDK stack
--> NDK regards backlog
DHCP-Fix lead to problems with ACD
--> DHCP-Fix fixed in another thread
further ACD problems
--> didn't really existed. Release firmware with better optimization fixed timing issues

I would say everything in this thread is fixed

Thank you to all involved and best regards
Stefan

Processors

Processors forum

CCS/TMDSICE3359: EtherNet/IP PBM allocation issue