This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

malloc throwing exception in library

Other Parts Discussed in Thread: OMAP-L138, SYSBIOS

I have a VERY odd problem.  I am using SYSBIOS 6_34_02_18 built with code composer 5.3 and all of its associated latest versions of XDC, compilers, etc built for an ARM9 platform on a OMAP-L138.

I have exceptions turned off in every way I can figure out (building a custom bios with assertsEnabled=false, setting Main.common$.diags_ASSERT, Main.common$.diags_INTERNAL, and Defaults.common$.diags_ASSERT to DIAGS.ALWAYS_OFF).

If I invoke malloc with a very large number (3083855848) in my main function, I get a print out

ti.sysbios.heaps.HeapMem: line 307: out of memory: handle=0xc11bf728, size=3083855856

and the function returns NULL.  I'd prefer no printout, but this is fine.

BUT, when I invoke malloc with that same number in library linked to by my main application, I get an assertion!   I have absolutely no explanation for the mechanism for this.   How could malloc in my main be different than malloc in my library?    Poking around with break points, both mallocs get resolved to the malloc in my generated configPkg/package/cfg/st_pe9.c file, but in the main we get the print out, and in the library there is an assertion that seems (If I copy and past the PC,LR,and PC registers into the registers view) to be getting thrown from line 270 of HeapMem.c with an illegal memory access, i.e., it dies at a random line in the file,

               if (remainSize) {
                    newHeader = (HeapMem_Header *)
                        ((Memory_Size)allocAddr + adjSize);
                    newHeader->next = curHeader->next;   ---> seems to die here
                    newHeader->size = remainSize;
                    curHeader->next = newHeader;
                }

This is an utter disaster, as it brings my application to an unrecoverable halt in the field.   Any ideas of even how I can formulate a rational question on this?

By the way, I did observe that if I call malloc(0xFFFFFFFF), then the sizeof a small structure gets added to this and silently wraps around to successfully and erroneously create a very small buffer.   I'm thinking this is a bug that could potentially cause me some significant problems (since 0xFFFFFFFF is what you get when you read uninitialized flash memory, and the 3rd party code I am using reads in a "size" value from flash and depends on malloc to return NULL in the case of crazy big numbers).

   Jay

  • Jay,

    How do you know you are getting an assertion?  The line 270 of HeapMem.c is not an assertion.  It looks to me like you have a bad de-reference of a pointer which maybe causing an exception.  The code is doing a size check so I'm not sure why it would get into this line because line 234 checks whether there is enough memory.

    Is the malloc supposed to fail because you are trying to malloc more memory than is available in the heap?

    Judah

  • Judah,

    Sorry, assertion was the wrong word:  I should say I am getting an exception there instead of an assertion, and I trace it to that line by setting the PC, SP, and LR registers to the values reported in the exception result.   I expect it is a bad pointer de-reference, but the fundamental question is:  why do we get an exception when calling malloc with a huge number from a library, but get a NULL (and a print out) when calling from the main application.   I can't think of any reason this would be the case, unless there is some macro magic going on with the "malloc" call.

    And yes, I have some third party software that in some situations will end up requesting more memory than is available on the heap, and malloc MUST return NULL in that case.  Basically, it is a flash file system, and if the flash has been written with arbitrary data it reads in a "number of sectors" from the flash and allocates memory based on that read in number.   In this particular case we read in a large number of sectors (in the billions), and call malloc to build an access table, and I need it to fail in a "POSIX compliant" way, i.e. return NULL, NOT bring down the whole application by throwing an exception.   Before I "disabled" assertions, I was getting an assertion failure which brought down the system in this case.   Now I am just crashing the system with an exception when calling malloc with a too large value from any library (I've tried it in two separate ones), even though the "correct" behavior happens when calling from code in the executable project.  

      Jay

  • Well, I was able to work around my problem.  My third party library allows me to provide my "own" malloc, which I defined as

    #include <stdlib.h>
    #include <xdc/runtime/Memory.h>

    void *FsMalloc(size_t size)
    {
      Memory_Stats stats;

      /* first sanity check the size */
      Memory_getStats(NULL, &stats);
      if (stats.totalSize < size) {
        return NULL;
      }

      return malloc(size);
    }

    Several notes on this

    1. I would like to request that this check be put into the standard HeapMem_alloc.   This would solve the bug (that is for sure there right now) of having a malloc(0xFFFFFFFF) successfully return a small buffer because 0xFFFFFFFF+(sizeof(header)) silently wraps around to sizeof(header)-1.
    2. I noticed in implementing this that I HAD to use stats.totalSize, because by the time I got to my library calls, stats.totalFreeSize and stats.largestFreeSize were both "negative", and since these are unsigned numbers that means they were in the billions.   This is why when I requested a huge number the malloc died, because it thought it had the space according to these statistics. 

    Even though I've gotten a workaround that suffices for me, I would like to pursue understanding how the internal bookkeeping got messed up in HeapMem so that a) I can fix my code and b) you can make SYSBIOS more robust to whatever stupid thing I did.   Do you have any suggestions for proceeding?

  • Jay,

    Thanks for the information.  I will look into the 0xFFFFFFFF issue and file a bug as necessary.

    As far as how the heap got corrupted.  You might want to try using the module HeapTrack to help find this.  We developed HeapTrack for the sole purpose of debugging issues with Heaps.  Do you have multiple heaps in your system or just one?  I'm wondering if you are perhaps calling free with a wrong size?

    Judah

  • I tracked down the problem:  it turns out part of our code was needlessly stomping on that area of memory, leading to the heap bookkeeping corruption, and luckily (or unluckily, depending on how you look at it), the ONLY time this caused a problem was when too much memory was requested from the heap and the heap allocator happily went on and tried to get unmapped memory (because it thought it had BILLIONS of bytes left to allocate).

    Adding a sanity check to the heap allocator to make it more robust (and fix the malloc(0xFFFFFFFF) issue) is still a good idea though, IMO.

    Thanks,

        Jay