Stack overflow when calling NDK function fdSelect()

David Aldrich92784

Other Parts Discussed in Thread: SYSBIOS

I am using NDK 2.20.04.26 and am working with the NDK helloWorld example.

I have added code to poll an open TCP/IP socket for data, using fdSelect:

timeval timeOut;

timeOut.

tv_sec = 0;

timeOut.

tv_usec = a_uSecTimeout;

fd_set readFDs;

FD_ZERO(&readFDs);

FD_SET(a_socket,&readFDs);

int retVal = fdSelect( 0, &readFDs, 0, 0, &timeOut );

This works fine the first few times that I call it but then a stack overflow occurs in the fdSelect() call:

ti.sysbios.knl.Task: line 345: E_spOutOfBounds: Task 0xc0b9b28 stack error, SP = 0xc0a6cd0.

xdc.runtime.Error.raise: terminating execution

I guess the logical thing to do is to increase the size of the stack for that task. However, the task is spawned by the TCP/IP server daemon:

hHello = DaemonNew( SOCK_STREAMNC, inet_addr(LocalIPAddr), 3000, dtask_tcpip_server, OS_TASKPRINORM, OS_TASKSTKNORM, 0, 3 );

Changing OS_TASKSTKNORM to OS_TASKSTKHIGH makes no improvement.

How can I debug or fix this please?

Best regards

David

over 12 years ago

0 Steven Connell over 12 years ago

TI__Mastermind 45025 points

Hi David,

Which version of BIOS and CCS are you using? Which hardware platform?

Can you please open the ROV tool when this happens? It's in CCS under the menu 'tools -> ROV'. ROV will show you some details on the Tasks that are in your system, including stack sizes and peak usage.

You can use the peak to get an idea of how much to increase your stack. For example, when I see a problem like this I will change the Task stack size to be a bit bigger than the peak size shown in the ROV tool and keep increasing until the app runs OK. Then I again check the peak usage and tweak the stack size accordingly.

You can also choose any stack size you like to pass into the DaemonNew() function (meaning you can just pass a size such as 16384, you aren't required to use the OS_TASKSTK* values).

Steve

0 David Aldrich92784 over 12 years ago in reply to Steven Connell

Genius 3720 points

Hi Steve

Thanks for your reply. We are using SYS/BIOS 6.32.4.49 and CCS 5.3

Using ROV, I can see that stackPeak of dchild is 176 and stackSize is 5120. So the stack size looks alright.

I realise now that I have misinterpreted the E_spOutOfBounds exception. The stack is not exhausted but rather the stack pointer appears to be corrupted. Am I correct?

If so, I guess the task block has been corrupted, but I don't know how.

David

0 David Aldrich92784 over 12 years ago in reply to David Aldrich92784

Genius 3720 points

I have some more information about my problem:

At some point while running the my tcp server (spawned by DaemonNew) I get an exception:

ti.sysbios.knl.Task: line 345: E_spOutOfBounds: Task 0xc0b9aa8 stack error, SP = 0xc0b6cc8.

xdc.runtime.Error.raise: terminating execution

CCS help states that this “Error [is] raised when a task's stack pointer (SP) does not point somewhere within the task's stack”.

It’s interesting to note that when my server callback dtask_tcpip_server() is entered, i.e. before the exception happens and before any of my code is executed, the Stack Pointer value is: 0x0C0B6EB0 and the ROV tool shows that the spawned task ‘dchild’ has values:

stackSize = 5120

stackBase = 0xc0c5b20

Now 0xc0c5b20 – 5120 = 0xc0c4720

So the stack space allocated to that task is 0xc0c4720 - 0xc0c5b20, so I don’t know why the SP has value : 0x0C0B6EB0 when we enter that task.

Any thoughts on the reason for this exception, or how to debug it, would be much appreciated please.

David

0 David Aldrich92784 over 12 years ago in reply to David Aldrich92784

Genius 3720 points

I am attaching my project files in case anyone at TI wants to look at them.

David

TcpIpServer.zip

0 David Aldrich92784 over 12 years ago in reply to David Aldrich92784

Genius 3720 points

Please will a TI engineer have another look at this issue for me? It is unresolved and urgent.

Thanks

David

0 Thomas Brown over 12 years ago in reply to David Aldrich92784

TI__Expert 5225 points

Hello David,

I still suspect stack exhaustion.

Even though the main thread 'StackTest' spawns a new thread 'dtask_tcpip_server', I think this still may be using the 'StackTest' stack, of 5120, using a hook to track the stack in the thread creation. I'm not sure if looking at peak usage of dchild will be the most revealing indicator. What about the peak usage of 'StackTest'?

I notice the main difference between the example and your code is that you are using tcp rather than udp, but more significantly, you are using the following to allocate a buffer:

unsigned* p_buffer = (unsigned*) malloc(BUFFER_SIZE_BYTES/4); //BUFFER_SIZE_BYTES = 5000

The example udp application passes the following to the receive API, I don't see any mallocs.

char *pBuf;
HANDLE hBuffer;

So I am curious about the differences between the two, it would be interesting to know what memory section your p_buffer comes from vs the examples pbuf.

One test you could try would be to both increase the stack size and reducing the size of BUFFER_SIZE_BYTES and try receiving smaller packets.

If the problem persists then at least we confidently rule out stack exhaustion and do some more hands diagnosis. I have spent most of my time today wrestling with my CCS installation today, finally it's stabalised and I can build your project, so will run some tests in the morning.

Regards,

Tom

0 David Aldrich92784 over 12 years ago in reply to Thomas Brown

Genius 3720 points

Hi Tom

Thanks very much for your reply. While waiting I decided to start writing the code again from scratch. I now have a simpler implementation and it's working ok.

Best regards

David

Processors

Processors forum

Stack overflow when calling NDK function fdSelect()