TDA4VEN-Q1: c7x_x gets stuck in funcion HeapP_construct

Part Number: TDA4VEN-Q1

Tool/software:

Hi, TI experts:

During internal power cycling tests On Custom board, the C7x gets stuck inside the function HeapP_construct, occurring in both C7x_1 and C7x_2. When this issue occurs, MCU2_0 operates normally. The occurrence probability varies between 1 in 22 and 1 in 200 times.

When the hang occurs, it is not possible to connect to the C7x core via CCS

We discovered several anomalies with C7x_1 and C7x_2 (everything is normal on MCU2_0):

A. The assignment of variables appears ineffective — even when the C7x operates normally, this persists. In the vHeapCreateStatic function within Heap_internal.c, there is an assignment operation: heap->xTotalHeapSize = xTotalHeapSize; After assignment, printing heap->xTotalHeapSize yields incorrect results, while theoretically, it should display the value of xTotalHeapSize.

B. The size of the sizeof(struct_type) is zero — even when the C7x operates normally, this is observed. In the function HeapP_construct, there is an assertion DebugP_assert( sizeof(StaticHeap_t) < sizeof(HeapP_Object) ); By adding print statements, it was discovered that sizeof(HeapP_Object) is zero, whereas on MCU2_0, it is 128 bytes. This structure is supposed to be non-empty.

During power cycling tests on the EVM board, the same issue occurred once out of 400 cycles (log attached: ti-evm-c7x-hang.txt).

ti-evm-c7x-挂.txt

Can you please give us some guides to analyze this issue?

Our SDK version:

RTOS: 10.00.00.05 (19 Aug 2024)

LINUX: 10.00.00.08 (19 Aug 2024)

BRS

  • Hi xie jc,

    Can you please share some more details as to when this issue occurs? Is it when running some specific usecase? Are there any memory corruption? 

    Regards,

    Brijesh

  • Hi, Brijesh,

    We can reproduce this issue problem on EVM board using SPL boot mode,  We only added some print statements in the C7x code, I thing there are no specific usecase by now. Can you add some logs in c7x log and have a try?  I did not see any memory corruption when the issue occrur. 

    you can  refer the log i attached above: ti-evm-c7x-挂.txt

    My reproduce way:

    1. touch fille boot_script.service and add below content:

    [Unit]
    Description=/etc/rc.local Compatibility
    ConditionPathExists=/etc/boot_script

    [Service]
    Type=forking
    ExecStart=/etc/boot_script
    TimeoutSec=0
    StandardOutput=tty
    RemainAfterExit=yes
    SysVStartPriority=99
    [Install]
    WantedBy=multi-user.target

    2. touch file boot_script and add below content:  (I  think if all core boot normally, the /dev/rpmsg* will  exsit)

    #!/bin/bash
    # file /etc/boot_script

    while [ ! -e "/dev/rpmsg_ctrl3" ];do
    sleep 1
    done

    while [ ! -e "/dev/rpmsg6" ];do # maybe rpmsg7?
    sleep 1
    done

    reboot  -f 

    3. 

    cp -rf boot_script.service rootfs/etc/systemd/system/
    cp -rf boot_script rootfs/etc/
    cd rootfs/etc/systemd/system/multi-user.target.wants/
    ln -sf /etc/systemd/system/boot_script.service boot_script.service

    4. add print

    6521.patch.txt
    commit 61dbffa3c201678f47969826cdda3e48f07e1f9b
    Author: Yihao.Che <cheyihao@mit.cn>
    Date:   Mon Dec 23 15:13:35 2024 +0800
    
        for patch shared
        
        Change-Id: If25289663e180f1062e3cd2ea1c9cbf378eb2db6
    
    diff --git a/psdkra/app_utils/utils/mem/src/app_mem_free_rtos.c b/psdkra/app_utils/utils/mem/src/app_mem_free_rtos.c
    index 3e26896b7..53966e627 100755
    --- a/psdkra/app_utils/utils/mem/src/app_mem_free_rtos.c
    +++ b/psdkra/app_utils/utils/mem/src/app_mem_free_rtos.c
    @@ -189,7 +189,7 @@ int32_t appMemInit(app_mem_init_prm_t *prm)
         {
             g_app_mem_obj.target2SharedFxn = NULL;
         }
    -#endif  
    +#endif
     
         for(heap_id = 0; heap_id < APP_MEM_HEAP_MAX; heap_id++)
         {
    @@ -251,7 +251,10 @@ int32_t appMemInit(app_mem_init_prm_t *prm)
     
                     heap_prm->base   = heap_buf;
                     heap_prm->size   = heap_size;
    -                HeapP_construct(&heap_obj->rtos_heap_handle, heap_buf, heap_size);
    +
    +				appLogPrintf("%s: %d: MEM: Init22-1 ... !!! heap_buf: %p, heap_size: 0x%x\n", __func__, __LINE__, heap_buf, heap_size);
    +                HeapP_construct(&heap_obj->rtos_heap_handle, heap_buf, heap_size, &size1, &size2);
    +				appLogPrintf("%s: %d: MEM: Init33 ... !!! heap_buf: %p, heap_size: 0x%x,  size1: %d < size2: %d ?\n", __func__, __LINE__, heap_buf, heap_size, size1, size2);
     #endif
                 appLogPrintf("MEM: Created heap (%s, id=%d, flags=0x%08x) @ %p of size %d bytes !!!\n",
                     heap_prm->name,
    @@ -599,7 +602,7 @@ void  appMemCacheInv(void *ptr, uint32_t size)
         CacheP_inv(
             ptr,
             APP_MEM_ALIGN32(size, APP_MEM_ALIGN_MIN_BYTES),
    -        CacheP_TYPE_L1D);  
    +        CacheP_TYPE_L1D);
     #endif
         #endif
         appMemFence();
    @@ -850,7 +853,7 @@ int32_t appMemAddrTranslate(app_mem_rat_prm_t *prm)
             HwiP_restore(key);
         }
     #ifdef LDRA_UNTESTABLE_CODE
    -/* TIOVX-1770- LDRA Uncovered Id: TIOVX_CODE_COVERAGE_MEM_FREE_RTOS_UM09 */   
    +/* TIOVX-1770- LDRA Uncovered Id: TIOVX_CODE_COVERAGE_MEM_FREE_RTOS_UM09 */
         else
         {
             appLogPrintf("appMemAddrTranslate(): pRatRegs has not been set.  Use appMemSetRatRegs function to set before calling\n");
    diff --git a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/examples/kernel/dpl/dpl_demo/dpl_demo.c b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/examples/kernel/dpl/dpl_demo/dpl_demo.c
    index 1756df157..65a1f10a5 100755
    --- a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/examples/kernel/dpl/dpl_demo/dpl_demo.c
    +++ b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/examples/kernel/dpl/dpl_demo/dpl_demo.c
    @@ -134,9 +134,10 @@ void dpl_demo_main(void *args)
         {
             void *ptr1;
             uint32_t size1 = 1023u;
    +        size_t size2 = 0, size3 = 0;
     
             /* create heap */
    -        HeapP_construct(&gMyHeapObj, gMyHeapMem, MY_HEAP_MEM_SIZE);
    +        HeapP_construct(&gMyHeapObj, gMyHeapMem, MY_HEAP_MEM_SIZE, &size2, &size3);
     
             DebugP_log("[DPL] Heap free size = %d bytes\r\n",
                 (uint32_t)HeapP_getFreeHeapSize(&gMyHeapObj)
    diff --git a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/dpl/HeapP.h b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/dpl/HeapP.h
    index a3cdb298a..f2d2c3cee 100755
    --- a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/dpl/HeapP.h
    +++ b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/dpl/HeapP.h
    @@ -90,7 +90,7 @@ typedef struct HeapP_Object_ {
      * \param heapAddr  [in] Base address of memory to be used as heap
      * \param heapSize  [in] Size of memory block that is to be used as heap
      */
    -void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize );
    +void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize, size_t *size1, size_t *size2);
     
     /**
      * \brief Delete the user defined heap
    diff --git a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/freertos/dpl/common/HeapP_freertos.c b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/freertos/dpl/common/HeapP_freertos.c
    index 8016c2879..921716328 100755
    --- a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/freertos/dpl/common/HeapP_freertos.c
    +++ b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/freertos/dpl/common/HeapP_freertos.c
    @@ -29,21 +29,21 @@
      *  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
      *  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
      */
    -
    -
     #include <stdlib.h>
     #include <kernel/dpl/DebugP.h>
     #include <kernel/nortos/dpl/common/HeapP_internal.h>
     #include <FreeRTOS.h>
     #include <task.h>
     
    -void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize )
    +void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize,  size_t *size1, size_t *size2)
     {
    -    DebugP_assert( sizeof(StaticHeap_t) < sizeof(HeapP_Object) );    
    -
    +    DebugP_assert( sizeof(StaticHeap_t) < sizeof(HeapP_Object) );
    +	*size1 = sizeof(StaticHeap_t);
    +	*size2 = sizeof(HeapP_Object);
         vHeapCreateStatic((StaticHeap_t*)heap, heapAddr, heapSize);
     }
     
    +
     void   HeapP_destruct(HeapP_Object *heap)
     {
         vTaskSuspendAll();
    diff --git a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/nortos/dpl/common/HeapP_nortos.c b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/nortos/dpl/common/HeapP_nortos.c
    index 42afaf050..9e377cf78 100755
    --- a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/nortos/dpl/common/HeapP_nortos.c
    +++ b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/nortos/dpl/common/HeapP_nortos.c
    @@ -35,10 +35,11 @@
     #include <kernel/dpl/DebugP.h>
     #include <kernel/nortos/dpl/common/HeapP_internal.h>
     
    -void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize )
    +void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize,  size_t *size1, size_t *size2)
     {
    -    DebugP_assert( sizeof(StaticHeap_t) < sizeof(HeapP_Object) );
    -
    +    //DebugP_assert( sizeof(StaticHeap_t) < sizeof(HeapP_Object) );
    +	*size1 = sizeof(StaticHeap_t);
    +	*size2 = sizeof(HeapP_Object);
         vHeapCreateStatic((StaticHeap_t*)heap, heapAddr, heapSize);
     }
     
    diff --git a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/safertos/dpl/common/HeapP_safertos.c b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/safertos/dpl/common/HeapP_safertos.c
    index 41d8eaf0c..035f2eaaa 100644
    --- a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/safertos/dpl/common/HeapP_safertos.c
    +++ b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/source/kernel/safertos/dpl/common/HeapP_safertos.c
    @@ -37,13 +37,15 @@
     #include <SafeRTOS.h>
     #include <task.h>
     
    -void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize )
    +void   HeapP_construct( HeapP_Object *heap, void *heapAddr, size_t heapSize,  size_t *size1, size_t *size2)
     {
    -    DebugP_assert( sizeof(StaticHeap_t) < sizeof(HeapP_Object) );
    -
    +    //DebugP_assert( sizeof(StaticHeap_t) < sizeof(HeapP_Object) );
    +	*size1 = sizeof(StaticHeap_t);
    +	*size2 = sizeof(HeapP_Object);
         vHeapCreateStatic((StaticHeap_t*)heap, heapAddr, heapSize);
     }
     
    +
     void   HeapP_destruct(HeapP_Object *heap)
     {
         vTaskSuspendScheduler();
    diff --git a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/test/kernel/dpl/test_dpl.c b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/test/kernel/dpl/test_dpl.c
    index 4e840a7bf..985c7ac39 100755
    --- a/psdkra/mcu_plus_sdk_j722s_10_00_00_25/test/kernel/dpl/test_dpl.c
    +++ b/psdkra/mcu_plus_sdk_j722s_10_00_00_25/test/kernel/dpl/test_dpl.c
    @@ -545,9 +545,10 @@ void test_heap(void *args)
         uint32_t size[NUM_ALLOCS] = {255, 232, 255, 127, 63, 31, 15, 7, 3, 1};
         uint32_t freeSize;
         int32_t i;
    +    size_t size1 = 0, size2 = 0;
     
         /* create heap */
    -    HeapP_construct(&gMyHeap, gMyHeapMem, MY_HEAP_MEM_SIZE);
    +    HeapP_construct(&gMyHeap, gMyHeapMem, MY_HEAP_MEM_SIZE, &size1, &size2);
     
         freeSize = HeapP_getFreeHeapSize(&gMyHeap);
     
    

    If everything is normal, the boot_script will execute a restart. However, if the C7x hangs or there are other issues, the boot_script will stop.

    Thanks

  • Hi Brijesh,

    can you help look this thread? this can be reproduced on EVM, pls help have a try and fix it ASAP. Thanks for support!

    BR,

    Biao

  • Hi Biao, jc,

    I freshly created SD card from PSDKRA10.0 and installed your script. After running this script for almost 45min, i could see hang on C7x_2. So it is very hard to reproduce this issue. I further check why C7x is hanging. 

    Regards,

    Brijesh

  • Hello Biao, Jc,

    Do you see hang on C7x with the below prints? This is what i am seeing on J722S after several 100s of reboot. Why do you think this is related to Heap_Mem related?

    [C7x_2 ] 245.070897 s: IPC: 4 CPUs participating in IPC !!!
    [C7x_2 ] 245.071215 s: IPC: Waiting for HLOS to be ready ... !!!
    [C7x_2 ] 257.865183 s: IPC: HLOS is ready !!!
    [C7x_2 ] 257.865265 s: IPC: Init ... Done !!!
    [C7x_2 ] 257.865282 s: APP: Syncing with 3 CPUs ... !!!

    Regards,

    Brijes

  • Hi,  Brejesh,

    There are indeed several stuck cases on the EVM board, you can try a few more times, should be able to reproduce the situation we encountered, stuck in the function HeapP_construct inside

    BRs
  • Hi xie jc,

    Yes, this seems bit involved debug, so will take some time to get the fix for it. 

    Regards,

    Brijesh

  • Hi Brijesh,

    Thank you for the update, If there's anything I can do to assist, please let me Know

    BRs

  • Hi jc,

    There is definitely a page fault on C7x, but still trying to figure out from where it is coming.

    Regards,

    Brijesh

  • Hi, Brijesh,

    It sounds like good news, and it seems we’re close to resolving the issue.

    BRs

  • Hi jc,

    Its strange that even if the MMU entry is available for MMR region, we are seeing page fault when access some of these registers and that too comes rarely. and we are trying to figure out why this is failing. I am suspecting some MMU configuration.

    Regards,

    Brijesh 

  • Hi jc,

    We found one issue in the uboot MMU settings, which could explain the issues that we are seeing. Could you please apply the attached patch on top of board-support\ti-u-boot-2024.04+git folder, rebuild the uboot and try with the updated uboot? 

    /cfs-file/__key/communityserver-discussions-components-files/791/MMU_5F00_Fix_5F00_For_5F00_Vision_5F00_apps.patch

    This should fix the issues that you are observing on C7x. 

    Regards,

    Brijesh

  • Hi, Brijesh,

    Great!!! We'll test it after applying the patch, and we'll get back to you with any updates.

    Thanks 

    BRs

  • Hi jc,

    Thanks, btw, i ran this test for more than 60Hrs, i did not get same crash on C7x again. So this fix is promising. Hopefully it fixes your issue as well. 

    Regards,

    Brijesh

  • Hi, Brijesh, 

    Thanks. Sounds very promising. ^O^

    BRs.

  • Hi, Brijesh,

    Good news, We have test for hours and haven't been able to reproduce the issue so far. Could you briefly explain the troubleshooting process and why this modification was made? 

    Thanks

    BRs

  • Hi, Brijesh,

    To add, what is the difference between MT_NORMAL_NC and MT_NORMAL?

    Thanks

    BRs

  • Hi jc,

    Essentially, MMU entries in uboot was not properly carving out the sections used for the remote core firmwares. This memory was marked as cached (MT_NORMAL), which was creating the problem. This was corrupting TLM memory of C7x and causing C7x to crash (exception). 

    Making this memory region (memory used by firmwares) as non-cached (MT_NORMAL_NC) helped in fixing this issue. 

    Regards,

    Brijesh

  • Hi, Brijesh,

    Got it, thank you

    BRs

  • Hi jc,

    I guess this issue is resolved, so i am closing this thread. 

    Regards,

    Brijesh