This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/TM4C1294NCPDT: HeapTrack: A_BufOverlow

Part Number: TM4C1294NCPDT
Other Parts Discussed in Thread: 66AK2H14, SYSBIOS

Tool/software: TI-RTOS

CCS 6.1.3
tirtos_tivac_2_16_01_14(bios_6_45_02_31)
xdctools_3_32_00_06_core
ti-cgt-arm_15.12.1.LTS

I can see a heaptrack overflow error on a modified example "event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT". Please see the attachment.

"event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT_2" is the modified and "event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT_0" is the original.

Please help take a look.

event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT_0.zip

event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT_2.zip


Thanks

  • You can use "event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT_3": allocated buffer is freed if posting mail fails.

    I have checked there is no stack overflow. Note stack size is set to 4k.

    event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT_3.zip

  • Hi Jianyi,

    I might have found the issue, but I need to look at it some more. Most of the site is out next week, so I'll follow-up with you on Jan 2nd.

    Todd
  • ToddMullanix said:
    I might have found the issue, but I need to look at it some more.

    Todd, I was also able to repeat the failure with the example program from Jianyi.

    The HeapTrack_free() function was detecting that the HeapTrack_STARTSCRIBBLE was incorrect, causing HeapTrack_A_bufOverflow to be raised.

    By using a hardware watchpoint to watch when the incorrect scribble value was written, traced to when the HeapTrack_alloc() function inserted an entry into the heap tracker queue in the following call:

        /* Enqueue into the heap's linked list */
        Queue_put(trackQueue, &(tracker->queElem));

    Noticed that the Queue_put() function called by HeapTrack_alloc() disables Hwi's while modifying the linked list:

    Void Queue_put(Queue_Object *obj, Queue_Elem *elem)
    {
        UInt key;
    
        key = Hwi_disable();
    
        elem->next = &(obj->elem);
        elem->prev = obj->elem.prev;
        obj->elem.prev->next = elem;
        obj->elem.prev = elem;
    
        Hwi_restore(key);
    }

    Whereas the Queue_remove() function called by HeapTrack_free() doesn't disable Hwi's when modifying the linked list:

    Void Queue_remove(Queue_Elem *qelem) 
    {
    #if defined(__IAR_SYSTEMS_ICC__)
        Ptr temp;
        temp = qelem->next;
        qelem->prev->next = temp;
        temp = qelem->prev;
        qelem->next->prev = temp;
    #else
        qelem->prev->next = qelem->next;
        qelem->next->prev = qelem->prev;
    #endif
    }

    Therefore, if multiple tasks are performing allocations / frees using the HeapTrack then think there is a race condition that the queue of allocations maintained by the HeapTrack can become invalid leading to HeapTrack corrupting memory.

    As an experiment, modified Queue_remove() to disable Hwi's and the example has now run for longer without failing (got to 20,000,000 iterations without failing whereas prior to the change failed with 131,000 iterations). Given that the suspected issue is a race condition need further work to prove this really is the problem.

  • Hi Chester,

    Yep. I went in today to check the test I was running since Friday. I had added a Hwi_disable/restore around the Queue_remove() in HeapTrack_free() also and it was still chugging along. It does appear to be the non-atomic call to Queue_remove(). Annoying since I wrote the code:(

    Jianyi, can you add Hwi_disable()/Hwi_restore() around the Queue_remove() in HeapTrack_free() and rebuild bios? Does this fix your problem?

    Todd

  • Hi,  I will do the test.

  • Hi, Chester,
    I wonder how you could use HW watchpoint to debug who overwrites scribble? Because the buffer address allocated and to be freed is varying, and then so is that of the scribble. I think HW watchpoint could only be used for fixed location.
  • Jianyi Bao said:
    I wonder how you could use HW watchpoint to debug who overwrites scribble? Because the buffer address allocated and to be freed is varying, and then so is that of the scribble.

    In this case, when the A_BufOverlow occurred the address of the scribble it's incorrect value were the same from run-to-run. I was therefore able to set a hardware watchpoint to trap on the write to scribble address with the incorrect value at the point the problem occurred.

    Agree that if the failure didn't occur at the same scribble address and value from run to run that wouldn't have been able to use a hardware watchpoint to debug the overwrite.

  • Chester Gillon said:
    Therefore, if multiple tasks are performing allocations / frees using the HeapTrack then think there is a race condition that the queue of allocations maintained by the HeapTrack can become invalid leading to HeapTrack corrupting memory.

    To see if using HeapTrack could corrupt memory in use by the application, as well as corrupting the HeapTrack scribble value, modified the example to:

    a) Make the CreateMsg() function place a test pattern in the dynamically allocated buffer.

    b) Make the readertask() function verify the test pattern in the dynamically allocated buffer, prior to freeing the buffer.

    The output from readertask() reports how bytes in which a test pattern has been detected in, as "numIncorrectDataBytes".

    The modified example shows that readertask() can detect a few bytes of corruption prior to HeapTrack detecting an A_bufOverflow error:

    Implicit posting of Event_Id_02
    read id = 272678 (numIncorrectDataBytes=0)
    Implicit posting of Event_Id_02
    read id = 282777 (numIncorrectDataBytes=0)
    Implicit posting of Event_Id_02
    read id = 292878 (numIncorrectDataBytes=0)
    read id = 302486 (numIncorrectDataBytes=4)
    read id = 302541 (numIncorrectDataBytes=8)
    ti.sysbios.heaps.HeapTrack: line 156: assertion failure: A_bufOverflow: Buffer overflow
    xdc.runtime.Error.raise: terminating execution

    The updated example is attached event_EK_TM4C1294XL_TI_TivaTM4C1294NCPDT.zip

    Changing Queue_remove() to disable HWI's when modifying the queue also prevents the application detecting corruption to the test pattern, as well as preventing the A_bufOverflow.

    Edit: Following the change to Queue_remove() has run for 70,000,000 iterations without error whereas failed after approx 302,000 iterations before change.

  • ToddMullanix said:
    I had added a Hwi_disable/restore around the Queue_remove() in HeapTrack_free() also and it was still chugging along. It does appear to be the non-atomic call to Queue_remove().

    Todd, for consistency with say the HeapMem.c implementation, rather that adding Hwi_disable/restore around the Queue_remove() should Gate_enterModule/Gate_leaveModule be used either in HeapTrack.c or Queue.c?

    Could HeapTrack "inherit" the gate used by the underlying Heap?

    From a SYS/BIOS point of view I am note sure of benefit of disabling HWIs .va. using a gate (which can be configured to protect concurrent access from tasks, SWIs or HWIs).

  • Chester Gillon said:
    As an experiment, modified Queue_remove() to disable Hwi's and the example has now run for longer without failing

    In an attempt to create a simpler example to demonstrate the failure used just two SYS/BIOS tasks, i.e. no SWIs, where the tasks were:

    a) A write_task containing the following in an infinite loop:

    - Calls malloc() to allocate a buffer of a pseudo-random number of words.

    - Fills the buffer with a test pattern (incrementing sequence).

    - Uses Mailbox_post (BIOS_WAIT_FOREVER) for a message which contains the pointer to the buffer, the size of the buffer and the start of the test pattern.

    b) A read_task containing the following in an infinite loop:

    - Calls Mailbox_pend (BIOS_WAIT_FOREVER) to wait for a message from the write_task.

    - Verifies the contents of the test pattern in the buffer.

    - Calls free() to free the buffer which was allocated by the write_task.

    With this two task program running on a single core TM4C129 could not repeat the failure. The failure condition in HeapTrack is that a task has to call Queue_put() from HeapTrack_alloc() when the other task is in Queue_remove() from HeapTrack_free(). With a single core device, the task scheduling doesn't trigger the failure condition.

    Therefore, ran the program on the Cortex-A15 cores of a 66AK2H14, with SYS/BIOS 6.50.1.12 configured in SMP mode for four cores. The affinity of the write_task and read_task were set to different cores, to get the tasks running "in parallel". With HeapTrack enabled and no SYS/BIOS modifications the program failed quickly such that:

    a) The read_task detected one corrupt word in the test pattern approx every 25 messages.

    b) Within 1000 messages the test stopped due to a SYS/BIOS error. The following types of SYS/BIOS errors were seen in ten runs:

    - A data abort exception in the ti_sysbios_heaps_HeapTrack_free__E function.

    - HeapTrack raised an A_bufOverflow assertion failure due to an incorrect scribble word.

    - HeapMem raised an A_invalidFree assertion failure.

    The following modifications stopped the SMP mode test from failing:

    1) Disable HeapTrack (no errors after left running for 30,000,000 messages).

    2) Leave HeapTrack enabled, and modify modified Queue_remove() to disable/enable Hwi's (no errors after left running for 150,000,000 messages).

    This also demonstrates that HeapTrack suffers from a race condition when multiple tasks can be performing allocations / frees.

    The SMP example project is attached 66AK2H14_A15_sys_bios_heap_track.zip. It is set to run on a EVMK2H set to "DSP no boot" mode, and using a synchronous group of all four Cortex-A15 cores to load the program .

  • Hi,

      After some test, I think by disableing and restoring HWI in Queue_remove this issue has been solved. Thanks.

  • Hi Jianyi,

    Thanks for the update. The fix will be to disable interrupts in HeapTrack_free before calling Queue_remove. We document that Queue_remove() is not-atomic. Adding the Hwi_disable/restore in Queue_remove will slightly impact performance of other calls to Queue_remove for no added value.

    The bug number is SYSBIOS-604: HeapTrack_free calls Queue_remove in a non-atomic manner

    Todd