This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

NDK/TI-RTOS crashes on multiple HTTP requests at a very fast pace

MCU: TM4C1294NCPDT

TI-RTOS: v2.01.00.03

NDK: v2.23.01.01

CCS: v6.0.1.0040

Hi,

                Following the HTTP Example (processors.wiki.ti.com/.../TI-RTOS_HTTP_Example) for implementing Embedded Web Server on Tiva C MCU, I have noticed that when I keep on pressing the F5 (Refresh) button on my computer keyboard at a very fast pace (or press & hold the button), the RTOS running on Tiva C MCU crashes due to low heap memory. On the particular .cgi file hook function I have used the following at the beginning of the function to print out the Heap memory details.

       Memory_Stats stats;

       Memory_getStats(NULL, &stats);

       System_printf("%d, %d, %d\n", stats.totalSize, stats.totalFreeSize, stats.largestFreeSize);

 

                This resulted into the following being printed on the Console of Code Composer Studio in Debug Mode:

50000, 28808, 19584

50000, 35128, 25744

50000, 28808, 21632

50000, 24696, 21632

50000, 26664, 25760

50000, 24488, 19600

50000, 22312, 17552

50000, 20136, 15504

50000, 17960, 13456

50000, 15784, 11408

50000, 13608, 9360

50000, 11432, 7224

50000, 9256, 5088

50000, 7080, 4112

50000, 4904, 4112

{module#53}: line 307: out of memory: handle=0x20038078, size=2048

xdc.runtime.Error.raise: terminating execution

 

                This shows that the heap memory ran down to a very low memory resulting into eventual NDK or RTOS crash. Now, I can not ask the end user NOT to refresh the webpage at this fast pace. So, is there a way to handle this? If the webpage request is NOT handled by the NDK (just before the crash or under a low heap memory state), it is still Okay, but the Tiva C MCU must not crash!!

 

Thanks

 

Regards

Soumyajit Das

  • Soumyajit,

    There is a wiki topic describing how to adjust NDK memory usage, here: http://processors.wiki.ti.com/index.php/TI-RTOS_Networking_Stack_Memory_Usage  This might help.

    Eventually though, if there is a flood of requests, some memory allocations will fail.  Since your application is terminating… have you changed the Error.policy to TERMINATE instead of UNWIND?  If the policy is UNWIND then allocations should fail, but the app should not terminate.

    Scott 

  • Hi Scott,
    Thanks for your support. I started this application coding from "sdusbcopy" example provided in TI-RTOS examples for Tiva C MCU. I have not changed any settings pertaining to UNWIND and/or TERMINATE configuration. Can you tell me from where exactly can I change this setting to make the system UNWIND or TERMINATE?

    Thanks

    Regards
    Soumyajit
  • Hi Soumyajit,

    What version of TI-RTOS are you using?

    I opened the latest release and see a “fatsdusbcopy” example (but not one without the ‘fat’).  In that example the error policy is unwind:

    Can you open your .cfg file in XGCONF, and then click on the “Error” module in the “Outline” view, and see what the policy is set to?  If you see TERMINATE you can select “UNWIND” from the drop down.  Or, you can explicitly add “Error.policy = Error.UNWIND;” in text at the end of the .cfg file.  And then rebuild the project.

    Does this work?

    Thanks,
    Scott

  • Hi Scott,

       Thanks for your reply. My RTOS version is 2.1.0.3. I also used the same sdusbcopy example (the one using FAT). As expected, I found that the Error Policy is set to UNWIND. But still the system terminates on heap memory error. I would also like to inform you that the error (program termination) happened when I was debugging my application using JTAG debugger. I think that since my application has lot of "System_printf()" function call to print things on the debug console of Code Composer Studio, when I run the system in debug mode, the program terminates because the Ethernet requests are not handled that fast as it would have been if run in normal mode (no debugger connected). Whereas, when I run the code normally on the MCU (without the debugger connected), I am not finding the CPU to halt. Infact, I kept and held the refresh button (F5) on the computer keyboard (for some 15 seconds on google chrome web browser) to make the RTOS terminate or crash, but its NOT happening now.

        Is there anything like when we will use the JTAG debug mode, the UNWIND setting will switchover to TERMINATE mode of program execution? Still I suspect that my device may get short of heap memory (someday at customer's place) & eventual program crash & watchdog based recovery.

    Thanks

    Regards

    Soumyajit

  • Hi Soumyajit,

     

    Thanks for the additional details.

     

    I’m glad to hear it works in standalone mode. But I don’t know why having the debugger attached would cause an issue and termination. I have to look at this more, and I’ll get back to you when I know more or have some ideas of things to try…

     

    Regards,

    Scott

  • Soumyajit,

    Sorry for taking so long to get back. 

    In your case where the program is aborting with the debugger attached… can you please set a breakpoint on the function “xdc_runtime_System_abort__E”, and then run, send the multiple requests, until the breakpoint hits?  And then send a snapshot of the C callstack information shown in the “Debug” tab?

    I think what is happening is that there is an API call somewhere that is specifying a NULL error block parameter.  And if an allocation error happens in that call, the default behavior will be to abort the program, rather than use Error.policy setting.  If we find out where that API call is coming from we should be able to avoid the abort…

    Thanks,
    Scott

  • Hi Scott,
    Thanks for digging into the matter. Where will I find the function "xdc_runtime_System_abort__E"? Project directory search showed that they are inside some .xml file? If that is so, how do I put a breakpoint there.

    Regards
    Soumyajit
  • Hi Soumyajit,

    The actual source for this function resides in a package in XDCTools.  For example: xdctools_3_31_01_33_core\packages\xdc\runtime\System.c 

    The function name in this file is simply System_abort(); the symbol name I indicated is the expanded name after the build.

    You can set a breakpoint after loading the program by opening the Disassembly viewer (by selecting View->Disassembly in CCS), and then type in xdc_runtime_System_abort__E

    Regards,
    Scott 

  • Hi Scott,

       Following your instructions, I have simulated the error. Now the program have halted at "xdc_runtime_System_abort__E" (Asm Disassembly). By C callstack information, did you mean CCS->Menu->Tools->ROV->Viewable_Modules->Task->CallStacks?

    Regards

    Soumyajit

  • Hi Soumyajit,

    OK, good.  Thanks for posting that callstack info from ROV. 

    What I was thinking of was the simpler callstack shown in the regular top-level “Debug” tab that is displayed once the debug session is initiated.  This simpler display shows the basic path to the breakpoint.  For example, from a breakpoint I set on abort for an MSP432 program that intentionally aborts after a memory allocation failure:

     

    When you hit the breakpoint, if you could send the view like the above, hopefully it will show which function and line the abort call started from…

    Thanks,
    Scott

  • Hi Scott,

        Hope I provided you the right screenshot you asked for.

    Regards

    Soumyajit

  • Hi Soumyajit,

    Thanks for attaching the new picture.  This has enough info with function names, file names, and line numbers to show what happened. 

    Starting second from the bottom, in daemon.c (ndk_2_23_01_01\packages\ti\ndk\nettools\daemon\daemon.c), there is a call at line 461 to TaskCreate().  TaskCreate() is shown in the next higher element in the trace, and is in task.c (ndk_2_23_01_01\packages\ti\ndk\os\task.c).  This OSAL function calls to the SYS/BIOS Task_create() function at line 300. 


    The call to Task_create() is passing NULL as the Error_Block, so when the allocation attempt fails (as seen in the upper stack trace entries), an error gets raised, which results in the program abort. 

    I just filed a bug report for this: SDOCM00118691.  It isn’t clear yet if there was a specific reason for using NULL versus a real Error_Block.

    If you need a workaround, you should be able to modify TaskCreate() to initialize and pass an Error_Block in the Task_create() call. 

        Error_Block eb;
        Error_init(&eb);

           htsk = Task_create((Task_FuncPtr)pFun, &params, &eb);

    And then rebuild the NDK core libraries, as described on this wiki page: http://processors.wiki.ti.com/index.php/Rebuilding_The_NDK_Core_Using_Gmake

    I hope this all makes sense.  Thanks for your patience in tracking this down!

    Regards,
    Scott

  • Hi Scott,
    Thanks for resolving the issue by digging up the real cause. Although I have not changed the necessary OS files for the desired results, but i'll do the same soon.
    But we still don't have any clue on why this happened in JTAG DEBUG MODE & not during normal run-time mode, right? Moreover, we cannot debug the system without running it in DEBUG MODE!! This might not be a major concern as long as I have found the reason of a particular fault, but I am just a little bit more curious in knowing why this is happening.

    Regards
    Soumyajit Das
  • Hi Soumyajit,

    My guess is that in debug mode that there is additional RAM allocated for doing I/O with the debug host.  And maybe this RAM reduction provides more opportunity for this particular memory allocation to fail?  This is only a guess.  On Monday I will ask around to see if any co-workers know why, or have other suggestions.

    Regards,
    Scott