This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: mbox timeout error while on/off test

Part Number: TDA4VM

Dear Champs,

My customer faced mbox timeout error in the ti-sci module while on/off test.

Could you please check below log and let me know your suggestion how they can debug it?

This issue occurred in my customer's custom board, and they are facing this issue at 35% ratio while on/off test.

In their on/off test, they turn-on/off their system using external power controller.

(on time 410 sec / off time 10 sec)

Please check below logs.

- error log.

boot_err_cases.txt

- normal log(no error case).

boot_ok.txt

Thanks and Best Regards,

SI.

  • Hi SI,

    Is this a custom board or a standard TI TDA4VM EVM? 

    (on time 410 sec / off time 10 sec)

    How are they triggering the power on/off?  Software triggered or an external power supply toggling device?

    - Keerthy

  • Hi TI Team,

    It is a custom board we made,
    Test using an external power supply toggling device.

    thank you.
    sungnam

  • Hi Keerthy,

    Have you checked their log and can you suggest how they can debug it?

    Thanks and Best Regards,

    SI.

  • Hi Keerthy,

    They are still facing this issue at 35% ratio in their on/off test, and this is very urgent as they will start production soon.

    Could you please provide guide how they can debug it in the ti-sci module?

    Thanks and Best Regards,

    SI.

  • Hi SI,

    Trying to isolate if this mbox timeout is coming from remoteprocs. Can you disable the remote processor nodes for dsp & r5f & check if
    this occurs?

    That will give us a clue.

    - Keerthy

  • Hi TI Team,

    If I disable all remote processor, the mbox timeout does not occur.

    I did some additional testing.

    Disabling only R5F Cores( j7-main-r5f0_0, j7-main-r5f0_1) causes symptoms
    Disabling only DSP Cores (C6x,C7x) causes symptoms.

    thank you.

  • Hi TI Team,

    More information about the "disable all remote processor" test.

    After multiple iterations of the test, an mbox timeout error occurs.

    About 10% chance of occurrence.


    thank you.
    sungnam,

  • Hi Sungnam,

    What SDK version are you based off, and what firmwares are you using for each of the cores? 

    The mbox_timeout error is a classic symptom of your MCU1_0 firmware not running/executing properly. You can try disabling the MCU1_0 core in kernel dts and see if the issue still persists.

    I see you have a very different memory map compared to our SDK. What all changes you made to your U-Boot to accomodate this?

    regards

    Suman

  • Hi Suman, 

    Thanks for your comments.

    What SDK version are you based off, and what firmwares are you using for each of the cores? 

    -> We are using the 8.0 sdk. 

    ti-processor-sdk-rtos-j721e-evm-08_00_00_12 ,
    ti-processor-sdk-linux-j7-evm-08_00_00_08

    -> I am using the modified firmware included in RTOS 8.0. As mentioned in the previous article,
    mbox timeout occurs even if all firmware is not loaded.

     j7-c66_0-fw -> vision_apps_evm/vx_app_rtos_linux_c6x_1.out
     j7-c66_1-fw -> vision_apps_evm/vx_app_rtos_linux_c6x_2.out
     j7-c71_0-fw -> vision_apps_evm/vx_app_rtos_linux_c7x_1.out
     j7-main-r5f0_0-fw -> vision_apps_evm/vx_app_rtos_linux_mcu2_0.out
     j7-main-r5f0_1-fw -> vision_apps_evm/vx_app_rtos_linux_mcu2_1.out
     j7-mcu-r5f0_0-fw -> vision_apps_evm/vx_app_rtos_linux_mcu1_0.out


    The mbox_timeout error is a classic symptom of your MCU1_0 firmware not running/executing properly. You can try disabling the MCU1_0 core in kernel dts and see if the issue still persists.

    -> We are loading MCU1_0 F/W by referring to the document below.

    https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/08_00_00_12/exports/docs/psdk_rtos/docs/user_guide/developer_notes_mcu1_0_sysfw.html#spl-uboot-loading

    I see you have a very different memory map compared to our SDK. What all changes you made to your U-Boot to accomodate this?

    -> We have changed the memory map to optimize memory usage.
    The DTS file referenced by uboot has been modified.

    If the memory map is modified incorrectly, it will continue to malfunction, but
    We have a 10-30% chance of malfunctioning.

    Should I build SYSFW if I change the memory map? We use pre-built images included in the SDK.

    thank you.
    sungnam,

  • Hi Suman,

     

    Is there any update on below issue?

    As I mentioned in the mail,

    My customer found same issue without mcu1_0 FW loading, but they could not disable it in the dts and they have no idea how they can disable it in dts.

    Could you please let them know how they can disable it in dts?

     

    And, is there any other suggestion?

     

    Thanks and Best Regards,

    SI.

  • Hi Sung-IL,

    You need to disable the MCU R5FSS cluster. Just ask your customer to add status = "disabled" under the mcu_r5fss0 node in their board dts file.

    regards

    Suman

  • Hi Suman,

    When they disabled MCU R5FSS cluster, they still faced same mbox error.

    They confirmed 'MCU R5FSS cluster disabled' as below.

    root@j7-evm:/proc/device-tree/bus@100000# grep -r disable * | grep r5
    bus@28380000/r5fss@41000000/r5f@41000000/status:disabled
    bus@28380000/r5fss@41000000/r5f@41400000/status:disabled

    Could you please let me know how they can debug it?

    Thanks and Best Regards,

    SI.

  • Suman,

    I see you have a very different memory map compared to our SDK. What all changes you made to your U-Boot to accomodate this?

    -> We have changed the memory map to optimize memory usage.
    The DTS file referenced by uboot has been modified.

    If the memory map is modified incorrectly, it will continue to malfunction, but
    We have a 10-30% chance of malfunctioning.

    Should I build SYSFW if I change the memory map? We use pre-built images included in the SDK.

    Could you please provide your comment if they should build SYSFW for the memory map change and how they can do it?

    Thanks and Best Regards,

    SI.

  • Hi Suman,

    Do you have any idea how this issue can be debugged?

    Thanks and Best Regards,

    SI.

  • Hi Ti Team,

    We still have symptoms reproduced with high frequency.
    Is there any updated information regarding debugging?

    Also, please check the information below, which was previously asked.

    "Should I build SYSFW if I change the memory map? We use pre-built images included in the SDK."
    Where should I refer to the source code and build guide document if I need to rebuild?

    thank you,
    sungnam.

  • Hi Sungnam,

    The SYSFW code and data are completely placed within the subsystem's Instruction and Data RAMs. So, they don't require any updates.

    What firmware are you using on MCU1_0? Is it a TI firmware adjusted for linker cmd files or your own customer firmware?

    We need to identify whether it is a memory-overlap issue or a thread priority issue in MCU1_0 firmware. Please do not use any remoteproc firmware on any of the MAIN domain R5Fs or DSPs. MCU1_0 is mandatory, so let's debug the MCU1_0 issue.

    Can you provide the readelf -l output of your MCU1_0 firmware image?

    regards

    Suman

  • Hi Suman,

    The linker cmd file was tweaked as follows with the changed memory map:

    MEMORY
    {
    /* R5F_TCMA [ size 32.00 KB ] */
    R5F_TCMA_VECS (X) : ORIGIN = 0x00000000 LENGTH = 0x00000040
    R5F_TCMA (X) : ORIGIN = 0x00000040 LENGTH = 0x00007FC0
    /* R5F_TCMB0_VECS [ size 256 B ] */
    R5F_TCMB0_VECS ( RWIX ) : ORIGIN = 0x41010000 , LENGTH = 0x00000040
    /* R5F_TCMB0 [ size 31.75 KB ] */
    R5F_TCMB0 ( RWIX ) : ORIGIN = 0x41010040 , LENGTH = 0x00007FC0

    /* DDR for MCU1_0 for Linux IPC [ size 1024.00 KB ] */
    DDR_MCU1_0_IPC ( RWIX ) : ORIGIN = 0xB8000000 , LENGTH = 0x00100000
    /* DDR for MCU1_0 for Linux resource table [ size 1024 B ] */
    DDR_MCU1_0_RESOURCE_TABLE ( RWIX ) : ORIGIN = 0xB8100000 , LENGTH = 0x00000400
    /* DDR for MCU1_0 for code/data [ size 12.00 MB ] */
    DDR_MCU1_0 ( RWIX ) : ORIGIN = 0xB8400000 , LENGTH = 0x00C00000

    /* Memory for IPC Vring's. MUST be non-cached or cache-coherent [ size 32.00 MB ] */
    IPC_VRING_MEM : ORIGIN = 0xC8000000 , LENGTH = 0x02000000
    /* Memory for remote core logging [ size 256.00 KB ] */
    APP_LOG_MEM : ORIGIN = 0xCA000000 , LENGTH = 0x00040000
    /* Memory for TI OpenVX shared memory. MUST be non-cached or cache-coherent [ size 63.62 MB ] */
    TIOVX_OBJ_DESC_MEM : ORIGIN = 0xCA040000 , LENGTH = 0x03FA0000

    /* Memory for shared memory buffers in DDR [ size 320.00 MB ] */
    DDR_SHARED_MEM : ORIGIN = 0xD0000000 , LENGTH = 0x14000000

    /* DDR for MCU1_0 for local heap [ size 8.00 MB ] */
    DDR_MCU1_0_LOCAL_HEAP ( RWIX ) : ORIGIN = 0xE6000000 , LENGTH = 0x00800000
    }

    Even if we do not load any other firmware except for MCU1_0, mbox timeout occurs in the same way.

    Here is the requested readelf output.

    $ readelf -l vx_app_rtos_linux_mcu1_0.out

    Elf file type is EXEC (Executable file)
    Entry point 0x41010000
    There are 14 program headers, starting at offset 275196

    Program Headers:
    Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
    LOAD 0x000038 0x41010000 0x41010000 0x010b0 0x010b0 R E 0x8
    LOAD 0x002000 0xb8100000 0xb8100000 0x0008c 0x0008c R 0x1000
    LOAD 0x004000 0xb8400000 0xb8400000 0x00000 0xdd573 RW 0x2000
    LOAD 0x004000 0xb84dd578 0xb84dd578 0x00868 0x00868 R 0x8
    LOAD 0x006000 0xb84de000 0xb84de000 0x00000 0x5e000 RW 0x2000
    LOAD 0x006000 0xb853c000 0xb853c000 0x2a6a0 0x2a6a0 R E 0x8
    LOAD 0x0306a0 0xb85666a0 0xb85666a0 0x00000 0x10000 RW 0x8
    LOAD 0x0306a0 0xb85766a0 0xb85766a0 0x0c0a0 0x0c0a0 R 0x8
    LOAD 0x03c780 0xb8582780 0xb8582780 0x00000 0x05800 RW 0x80
    LOAD 0x03c780 0xb8587f80 0xb8587f80 0x04710 0x04710 R 0x4
    LOAD 0x040f00 0xb858c700 0xb858c700 0x01684 0x01684 R 0x80
    LOAD 0x042584 0xb858dd84 0xb858dd84 0x00000 0x011f8 RW 0x4
    LOAD 0x042584 0xb8ff6c00 0xb8ff6c00 0x00000 0x01400 R 0x4
    LOAD 0x042588 0xb8ff8000 0xb8ff8000 0x00000 0x08000 RW 0x8

    Section to Segment mapping:
    Segment Sections...
    00 .freertosrstvectors .bootCode .startupCode .startupData .text.hwi .text.cache .text.mpu .text.boot .mpu_cfg
    01 .resource_table
    02 .bss .tracebuf __llvm_prf_cnts
    03 .cinit
    04 .bss:taskStackSection
    05 .text
    06 .sysmem
    07 .const
    08 .data
    09 .const.devgroup*
    10 .boardcfg_data
    11 .bss.devgroup*
    12 .irqStack .fiqStack .abortStack .undStack .svcStack
    13 .stack

    Thank you,

    Sungnam.

  • Hi Sungnam,

    Are you customizing MCU1_0 firmware or including the Vision Apps MCU1_0 firmware as is (other than the linker map adjustments and adjusting for various memory-map update macros like IPC_VRING_MEM_ADDR)?

    Can we switch the MCU1_0 firmware with the PDK IPC ipc_echo_testb firmware but with the updated memory-map and IPC region change, and see if the issue still persists? Primary goal was to rule out any issues with Vision Apps MCU1_0 firmware (SDK testing is typically all using the PDK IPC firmware image).

    regards

    Suman

  • Hi Suman,

    We changed the memory map with the python tool provided by Ti.
    I confirmed that each memory address constant was changed with the changed memory map in the app_mem_map.h file.

    /* DDR for MCU1_0 for Linux IPC [ size 1024.00 KB ] */
    #define DDR_MCU1_0_IPC_ADDR (0xB8000000u)
    #define DDR_MCU1_0_IPC_SIZE (0x00100000u)

    /* DDR for MCU1_0 for all sections, used for reserving memory in DTS file [ size 15.00 MB ] */
    #define DDR_MCU1_0_DTS_ADDR (0xB8100000u)
    #define DDR_MCU1_0_DTS_SIZE (0x00F00000u)

    /* DDR for MCU1_0 for local heap [ size 8.00 MB ] */
    #define DDR_MCU1_0_LOCAL_HEAP_ADDR (0xE6000000u)
    #define DDR_MCU1_0_LOCAL_HEAP_SIZE (0x00800000u)

    /* Memory for IPC Vring's. MUST be non-cached or cache-coherent [ size 32.00 MB ] */
    #define IPC_VRING_MEM_ADDR (0xC8000000u)
    #define IPC_VRING_MEM_SIZE (0x02000000u)

    /* Memory for remote core logging [ size 256.00 KB ] */
    #define APP_LOG_MEM_ADDR (0xCA000000u)
    #define APP_LOG_MEM_SIZE (0x00040000u)

    We use the mcu1_0 firmware included in vision_apps without modification.
    We only applied with the changed memory map.

    Can you please tell me how to apply and test the ipc_echo_testb firmware?
    I only know how to use the pre-built "ipc_echo_testb_mcu1_0_release_strip".

    ifeq ($(BUILD_CPU_MCU1_0),yes)
    # copy remote firmware files for mcu1_0
    $(eval IMAGE_NAME := vx_app_rtos_linux_mcu1_0.out)
    cp $(VISION_APPS_PATH)/out/J7/R5F/$(RTOS)/$(LINUX_APP_PROFILE)/$(IMAGE_NAME) $(LINUX_FS_STAGE_PATH)/lib/firmware/$(FIRMWARE_SUBFOLDER)/.
    $(TIARMCGT_ROOT)/bin/armstrip -p $(LINUX_FS_STAGE_PATH)/lib/firmware/$(FIRMWARE_SUBFOLDER)/$(IMAGE_NAME)
    ln -sr $(LINUX_FS_STAGE_PATH)/lib/firmware/$(FIRMWARE_SUBFOLDER)/$(IMAGE_NAME) $(LINUX_FS_STAGE_PATH)/lib/firmware/j7-mcu-r5f0_0-fw
    else
    # Copy MCU1_0 firmware which is used in the default uboot
    ln -sr $(LINUX_FS_STAGE_PATH)/lib/firmware/pdk-ipc/ipc_echo_testb_mcu1_0_release_strip.xer5f $(LINUX_FS_STAGE_PATH)/lib/firmware/j7-mcu-r5f0_0-fw
    endif

    Thank you,

    Sungnam.

  • Hi Suman,

    Any updates?

    Thank you,

    Sungnam.

  • Hi TI teams

    We solved the problem.

    Thank you,

    sungnam.