This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: QNX+RTOS c66/c71 boot hangs

Part Number: TDA4VM

Hi TI experts,

C66 or C71 boot hangs on our custom board(Processor SDK RTOS version is 08_00_00_12, QNX version is 7.1), the UART output looks like:

-----------------------------------------------------------------

U-Boot 2020.01-dirty (Aug 12 2021 - 16:50:19 +0800)

SoC: J721E SR1.0
Model: Texas Instruments K3 J721E SoC
Board: J721EX-PM2-SOM rev E7
DRAM: 4 GiB
Flash: 0 Bytes
MMC: sdhci@4f80000: 0, sdhci@4fb0000: 1
Loading Environment from MMC... OK
In: serial@2800000
Out: serial@2800000
Err: serial@2800000
Net: Could not get PHY for ethernet@46000000: addr 0
phy_connect() failed
No ethernet found.

Hit any key to stop autoboot: 0
switch to partitions #0, OK
mmc1 is current device
SD/MMC found on device 1
526 bytes read in 3 ms (170.9 KiB/s)
Loaded env from uEnv.txt
Importing environment from mmc1 ...
Running uenvcmd ...
Core 1 is already in use. No rproc commands work
Core 2 is already in use. No rproc commands work
2370716 bytes read in 101 ms (22.4 MiB/s)
Load Remote Processor 2 with data@addr=0x82000000 2370716 bytes: Success!
506744 bytes read in 24 ms (20.1 MiB/s)
Load Remote Processor 3 with data@addr=0x82000000 506744 bytes: Success!
1579088 bytes read in 65 ms (23.2 MiB/s)
Load Remote Processor 6 with data@addr=0x82000000 1579088 bytes: Success!    (C66 firmware)
1579088 bytes read in 65 ms (23.2 MiB/s)
Load Remote Processor 7 with data@addr=0x82000000 1579088 bytes: Success!    (C66 firmware)
10399672 bytes read in 184 ms (53.9 MiB/s)
Load Remote Processor 8 with data@addr=0x82000000 10399672 bytes: Success!  (C71 firmware)

****** Then it HANGs here ! ******

-----------------------------------------------------------------

I finally found that it may hangs at function appLogCpuSyncWithMaster and function appLogCpuSyncWithSlave in file vision_apps/utils/console_io/src/app_log_writer.c, so I add some sleep(call sleep-version below) to the "Infinite" loop to them as following:

void appLogWaitTicks(uint32_t time_in_ticks)
{
    TaskP_sleep(time_in_ticks);
}


void appLogCpuSyncWithMaster(uint32_t self_cpu_id)

{
    /* TODO: Infinite wait for synchronization causing issues with QNX implementation */

    volatile uint32_t state;

    appLogSetCpuSyncState(self_cpu_id, APP_LOG_CPU_SYNC_STATE_INIT_DONE);

    do {
//printf("jqsun--5\n");
        appLogGetCpuSyncState(self_cpu_id, &state);
//printf("jqsun--5.0\n");
//appLogWaitMsecs(1);
appLogWaitTicks(1);                                                     //$$$$$$$$$$$$$$$$(C66 Hangs Here)
//printf("jqsun--5.1:s=%d ss=%x\n",self_cpu_id,state);
    } while(state != APP_LOG_CPU_SYNC_STATE_TEST_INIT_DONE);

    appLogSetCpuSyncState(self_cpu_id, APP_LOG_CPU_SYNC_STATE_CONFIRM_INIT_DONE);

    do {
//printf("jqsun--7\n");
        appLogGetCpuSyncState(self_cpu_id, &state);
//appLogWaitMsecs(1);
appLogWaitTicks(1);
//printf("jqsun--7.1:s=%d ss=%x\n",self_cpu_id,state);
        } while(state != APP_LOG_CPU_SYNC_STATE_RUN);
}

void appLogCpuSyncWithSlave(uint32_t slave_cpu_id)
{
    /* TODO: Infinite wait for synchronization causing issues with QNX implementation */

    volatile uint32_t state;

    appLogSetCpuSyncState(slave_cpu_id, APP_LOG_CPU_SYNC_STATE_TEST_INIT_DONE);

    do {
        appLogGetCpuSyncState(slave_cpu_id, &state);
//printf("jqsun--8:s=%d,ss=%x\n",slave_cpu_id,state);
        if(state == APP_LOG_CPU_SYNC_STATE_INIT_DONE)
        {

            appLogSetCpuSyncState(slave_cpu_id, APP_LOG_CPU_SYNC_STATE_TEST_INIT_DONE);

        }
//appLogWaitMsecs(1);
appLogWaitTicks(1);
    } while(state!=APP_LOG_CPU_SYNC_STATE_CONFIRM_INIT_DONE);

}

void appLogCpuSyncInit(uint32_t master_cpu_id, uint32_t self_cpu_id,
uint32_t sync_cpu_id_list[], uint32_t num_cpus)
{
    printf("jqsun--0: m=%d,s=%d, num=%d\n",master_cpu_id,self_cpu_id,num_cpus);
    if(self_cpu_id==master_cpu_id)
    {
        uint32_t i, slave_cpu_id;

        /* master CPU, sync with each slave CPU */
        for(i=0; i<num_cpus; i++)
        {
            slave_cpu_id = sync_cpu_id_list[i];
            if(slave_cpu_id != self_cpu_id)
            {
printf("jqsun--1, slave=%d\n", slave_cpu_id);
                appLogCpuSyncWithSlave(slave_cpu_id);
printf("jqsun--1.1\n");
            }
        }
        /* all slaves have finished their init, now start all slave's */
        for(i=0; i<num_cpus; i++)
        {
            slave_cpu_id = sync_cpu_id_list[i];
            if(slave_cpu_id != self_cpu_id)
            {
printf("jqsun--2, slave=%d\n", slave_cpu_id);
                appLogCpuSyncStartSlave(slave_cpu_id);
printf("jqsun--2.1\n");
            }
        }
    }
    else
    {
    /* slave CPU, sync's with master CPU */
printf("jqsun--3\n");
        appLogCpuSyncWithMaster(self_cpu_id);
printf("jqsun--3.1\n");
    }
}

Then the QNX booted successfully, and we got QNX prompt. But it's sad that, the c66 hangs at function appLogWaitTicks. The following is the logs printed by remote cores:

J7EVM@QNX:/# cd /ti_fs/vision_apps/
J7EVM@QNX:/ti_fs/vision_apps#
J7EVM@QNX:/ti_fs/vision_apps#
J7EVM@QNX:/ti_fs/vision_apps# ./vision_apps_init.sh
J7EVM@QNX:/ti_fs/vision_apps# [MCU2_0] 102.129749 s: CIO: Init ... Done !!!
[MCU2_0] 102.129791 s: ### CPU Frequency = 1000000000 Hz
[MCU2_0] 102.129820 s: APP: Init(rtos)... !!!
[MCU2_0] 102.129838 s: SCICLIENT: Init ... !!!
[MCU2_0] 102.130034 s: SCICLIENT: DMSC FW version [21.1.1--v2021.01a (Terrific Lla]
[MCU2_0] 102.130072 s: SCICLIENT: DMSC FW revision 0x15
[MCU2_0] 102.130097 s: SCICLIENT: DMSC FW ABI revision 3.1
[MCU2_0] 102.130122 s: SCICLIENT: Init ... Done !!!
[MCU2_0] 102.130143 s: UDMA: Init ... !!!
[MCU2_0] 102.131089 s: UDMA: Init ... Done !!!
[MCU2_0] 102.131135 s: MEM: Init ... !!!
[MCU2_0] 102.131170 s: MEM: Created heap (DDR_SHARED_MEM, id=0, flags=0x00000004) @ e1000000 of size 16777216 bytes !!!
[MCU2_0] 102.131228 s: MEM: Created heap (L3_MEM, id=1, flags=0x00000000) @ 3600000 of size 262144 bytes !!!
[MCU2_0] 102.131276 s: MEM: Created heap (DDR_NON_CACHE_ME, id=5, flags=0x00000000) @ d8000000 of size 16777216 bytes !!!
[MCU2_0] 102.131319 s: MEM: Init ... Done !!!
[MCU2_0] 102.131337 s: IPC: Init(rots)... !!!
[MCU2_0] 102.131386 s: IPC: 6 CPUs participating in IPC !!!
[MCU2_0] 102.136193 s: IPC: Init ... Done !!!
[MCU2_0] 102.136242 s: APP: Syncing with 5 CPUs ... !!!
[MCU2_0] 102.136292 s: jqsun--0: m=3,s=3, num=5
[MCU2_0] 102.136323 s: jqsun--1, slave=4
[MCU2_0] 102.141469 s: jqsun--1.1
[MCU2_0] 102.141498 s: jqsun--1, slave=7                 =============>Waiting slave 7(c66) here
[MCU2_1] 102.116989 s: CIO: Init ... Done !!!
[MCU2_1] 102.117033 s: ### CPU Frequency = 1000000000 Hz
[MCU2_1] 102.117059 s: APP: Init(rtos)... !!!
[MCU2_1] 102.117079 s: SCICLIENT: Init ... !!!
[MCU2_1] 102.117272 s: SCICLIENT: DMSC FW version [21.1.1--v2021.01a (Terrific Lla]
[MCU2_1] 102.117309 s: SCICLIENT: DMSC FW revision 0x15
[MCU2_1] 102.117332 s: SCICLIENT: DMSC FW ABI revision 3.1
[MCU2_1] 102.117357 s: SCICLIENT: Init ... Done !!!
[MCU2_1] 102.117378 s: UDMA: Init ... !!!
[MCU2_1] 102.118464 s: UDMA: Init ... Done !!!
[MCU2_1] 102.118512 s: MEM: Init ... !!!
[MCU2_1] 102.118546 s: MEM: Created heap (DDR_SHARED_MEM, id=0, flags=0x00000004) @ e2000000 of size 16777216 bytes !!!
[MCU2_1] 102.118608 s: MEM: Created heap (L3_MEM, id=1, flags=0x00000001) @ 3640000 of size 262144 bytes !!!
[MCU2_1] 102.118653 s: MEM: Created heap (DDR_NON_CACHE_ME, id=5, flags=0x00000000) @ d9000000 of size 117440512 bytes !!!
[MCU2_1] 102.118697 s: MEM: Init ... Done !!!
[MCU2_1] 102.118715 s: IPC: Init(rots)... !!!
[MCU2_1] 102.118759 s: IPC: 6 CPUs participating in IPC !!!
[MCU2_1] 102.123642 s: IPC: Init ... Done !!!
[MCU2_1] 102.123692 s: APP: Syncing with 5 CPUs ... !!!
[MCU2_1] 102.123742 s: jqsun--0: m=3,s=4, num=5
[MCU2_1] 102.123769 s: jqsun--3
[C6x_1 ] 102.190487 s: CIO: Init ... Done !!!
[C6x_1 ] 102.190504 s: ### CPU Frequency = 1350000000 Hz
[C6x_1 ] 102.190518 s: APP: Init(rtos)... !!!
[C6x_1 ] 102.190530 s: SCICLIENT: Init ... !!!
[C6x_1 ] 102.190704 s: SCICLIENT: DMSC FW version [21.1.1--v2021.01a (Terrific Lla]
[C6x_1 ] 102.190719 s: SCICLIENT: DMSC FW revision 0x15
[C6x_1 ] 102.190732 s: SCICLIENT: DMSC FW ABI revision 3.1
[C6x_1 ] 102.190745 s: SCICLIENT: Init ... Done !!!
[C6x_1 ] 102.190757 s: UDMA: Init ... !!!
[C6x_1 ] 102.191919 s: UDMA: Init ... Done !!!
[C6x_1 ] 102.191941 s: MEM: Init ... !!!
[C6x_1 ] 102.191956 s: MEM: Created heap (DDR_SHARED_MEM, id=0, flags=0x00000004) @ e4000000 of size 16777216 bytes !!!
[C6x_1 ] 102.191976 s: MEM: Created heap (L2_MEM, id=2, flags=0x00000001) @ 800000 of size 229376 bytes !!!
[C6x_1 ] 102.191994 s: MEM: Created heap (DDR_SCRATCH_MEM, id=4, flags=0x00000001) @ e5000000 of size 50331648 bytes !!!
[C6x_1 ] 102.192013 s: MEM: Init ... Done !!!
[C6x_1 ] 102.192025 s: IPC: Init(rots)... !!!
[C6x_1 ] 102.192048 s: IPC: 6 CPUs participating in IPC !!!
[C6x_1 ] 102.195263 s: IPC: Init ... Done !!!
[C6x_1 ] 102.195291 s: APP: Syncing with 5 CPUs ... !!!
[C6x_1 ] 102.195322 s: jqsun--0: m=3,s=7, num=5
[C6x_1 ] 102.195338 s: jqsun--3                               ==========================> C66 hangs here
[C6x_2 ] 102.274465 s: CIO: Init ... Done !!!
[C6x_2 ] 102.274488 s: ### CPU Frequency = 1350000000 Hz
[C6x_2 ] 102.274502 s: APP: Init(rtos)... !!!
[C6x_2 ] 102.274515 s: SCICLIENT: Init ... !!!
[C6x_2 ] 102.274700 s: SCICLIENT: DMSC FW version [21.1.1--v2021.01a (Terrific Lla]
[C6x_2 ] 102.274716 s: SCICLIENT: DMSC FW revision 0x15
[C6x_2 ] 102.274730 s: SCICLIENT: DMSC FW ABI revision 3.1
[C6x_2 ] 102.274744 s: SCICLIENT: Init ... Done !!!
[C6x_2 ] 102.274759 s: UDMA: Init ... !!!
[C6x_2 ] 102.275920 s: UDMA: Init ... Done !!!
[C6x_2 ] 102.275943 s: MEM: Init ... !!!
[C6x_2 ] 102.275958 s: MEM: Created heap (DDR_SHARED_MEM, id=0, flags=0x00000004) @ e8000000 of size 16777216 bytes !!!
[C6x_2 ] 102.275978 s: MEM: Created heap (L2_MEM, id=2, flags=0x00000001) @ 800000 of size 229376 bytes !!!
[C6x_2 ] 102.275996 s: MEM: Created heap (DDR_SCRATCH_MEM, id=4, flags=0x00000001) @ e9000000 of size 50331648 bytes !!!
[C6x_2 ] 102.276015 s: MEM: Init ... Done !!!
[C6x_2 ] 102.276026 s: IPC: Init(rots)... !!!
[C6x_2 ] 102.276048 s: IPC: 6 CPUs participating in IPC !!!
[C6x_2 ] 102.279289 s: IPC: Init ... Done !!!
[C6x_2 ] 102.279317 s: APP: Syncing with 5 CPUs ... !!!
[C6x_2 ] 102.279347 s: jqsun--0: m=3,s=8, num=5
[C6x_2 ] 102.279362 s: jqsun--3
[C7x_1 ] 102.484258 s: CIO: Init ... Done !!!
[C7x_1 ] 102.484272 s: ### CPU Frequency = 1000000000 Hz
[C7x_1 ] 102.484282 s: APP: Init(rtos)... !!!
[C7x_1 ] 102.484291 s: SCICLIENT: Init ... !!!
[C7x_1 ] 102.484454 s: SCICLIENT: DMSC FW version [21.1.1--v2021.01a (Terrific Lla]
[C7x_1 ] 102.484468 s: SCICLIENT: DMSC FW revision 0x15
[C7x_1 ] 102.484479 s: SCICLIENT: DMSC FW ABI revision 3.1
[C7x_1 ] 102.484489 s: SCICLIENT: Init ... Done !!!
[C7x_1 ] 102.484498 s: UDMA: Init ... !!!
[C7x_1 ] 102.485337 s: UDMA: Init ... Done !!!
[C7x_1 ] 102.485349 s: MEM: Init ... !!!
[C7x_1 ] 102.485360 s: MEM: Created heap (DDR_SHARED_MEM, id=0, flags=0x00000004) @ 100000000 of size 268435456 bytes !!!
[C7x_1 ] 102.485381 s: MEM: Created heap (L3_MEM, id=1, flags=0x00000001) @ 70020000 of size 8159232 bytes !!!
[C7x_1 ] 102.485399 s: MEM: Created heap (L2_MEM, id=2, flags=0x00000001) @ 64800000 of size 491520 bytes !!!
[C7x_1 ] 102.485416 s: MEM: Created heap (L1_MEM, id=3, flags=0x00000001) @ 64e00000 of size 16384 bytes !!!
[C7x_1 ] 102.485433 s: MEM: Created heap (DDR_SCRATCH_MEM, id=4, flags=0x00000001) @ ec000000 of size 268435456 bytes !!!
[C7x_1 ] 102.485451 s: MEM: Init ... Done !!!
[C7x_1 ] 102.485459 s: IPC: Init(rots)... !!!
[C7x_1 ] 102.485473 s: IPC: 6 CPUs participating in IPC !!!
[C7x_1 ] 102.487281 s: IPC: Init ... Done !!!
[C7x_1 ] 102.487294 s: APP: Syncing with 5 CPUs ... !!!
[C7x_1 ] 102.487316 s: jqsun--0: m=3,s=9, num=5
[C7x_1 ] 102.487329 s: jqsun--3

I have tried Linux+RTOS ("Processor SDK Linux" version 07_03_00_05, Processor SDK RTOS version 07_03_00_07) before, and the remote cores worked fine on our custom board(We don't have much time to port Linux+RTOS-08_00_00_12 to our custom board). 

I also tried the "sleep-version" firmware on TDA4evm, the c66 hangs as well.

The questions are:

(1) What's the remote-core firmware's differences between Linux+RTOS and QNX+RTOS mode?

(2) Why remote cores works fine with Linux+RTOS mode, and not with QNX+RTOS mode on our custom board? What could caused this problem? Did any customers even encountered this issue?

(3) Why the "sleep-version" will lead c66 hang on TDA4evm? Is there any timing issue?

(4) Is it a Hardware bug? What needs to be checked on our board?

Please give us some guide to resolve this problem.

Thanks

Jianqiang