This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] EDGE-AI-STUDIO: Failing to compile all models for TI Deep Learning (TIDL) on SDK 9.x due to Segmentation Fault on host PC [AM6xA and TDA4x processors]

Part Number: EDGE-AI-STUDIO
Other Parts Discussed in Thread: AM68A, AM69A, AM67A, TDA4VM, TDA4VL, TDA4VH

Tool/software:

Question:

I am compiling a model with edgeai-tidl-tools for one of the 9.x SDKs (9.0, 9.1, 9.2).

When I am compiling, a message later in the logs shows a Segmentation Fault, like the following:

Running with OTF buffer optimizations
Segmentation fault (core dumped)
Could not open /USERS_PATH/edgeai-tidl-tools/model-artifacts/cl-ort-resnet18-v1/tempDir/subgraph_0_tidl_net/perfSimInfo.bin
WARNING: [TIDL_E_DATAFLOW_INFO_NULL] Network compiler returned with error or didn't executed, this model can only be used on PC/Host emulation mode, it is not expected to work on target/EVM.

But the process didn’t actually exit with an error or indicate error to the user-level program, so I assumed it might be okay. I tried to run the model on PC, and it even gave some correct results.

Then when I try to run the model on device, I get error messages like the following:

TIDL_RT_OVX: ERROR: Create OpenVX graph failed
TIDL_RT_OVX: ERROR: Verify OpenVX graph failed
0.73658s: VX_ZONE_ERROR:[tivxKernelTIDLCreate:907] Network version - 0x00000000, Expected version - 0x20240401
0.73817s: VX_ZONE_ERROR:[ownContextSendCmd:875] Command ack message returned failure cmd_status: -1
0.73824s: VX_ZONE_ERROR:[ownNodeKernelInit:590] Target kernel, TIVX_CMD_NODE_CREATE failed for node node_79
0.73826s: VX_ZONE_ERROR:[ownNodeKernelInit:591] Please be sure the target callbacks have been registered for this core
0.73828s: VX_ZONE_ERROR:[ownNodeKernelInit:592] If the target callbacks have been registered, please ensure no errors are occurring within the create callback of this kernel
0.73843s: VX_ZONE_ERROR:[ownGraphNodeKernelInit:608] kernel init for node 0, kernel com.ti.tidl:1:1 ... failed !!!
0.73848s: VX_ZONE_ERROR:[vxVerifyGraph:2159] Node kernel init failed
0.73868s: VX_ZONE_ERROR:[vxVerifyGraph:2213] Graph verify failed
0.73888s: VX_ZONE_ERROR:[ownGraphScheduleGraphWrapper:885] graph is not in a state required to be scheduled
0.73890s: VX_ZONE_ERROR:[vxProcessGraph:813] schedule graph failed
0.73892s: VX_ZONE_ERROR:[vxProcessGraph:818] wait graph failed
ERROR: Running TIDL graph ... Failed !!!

 

What might be happening here? How do I debug this or workaround it?

Devices impacted:

  • AM62A, AM67A, AM68A, AM69A
  • TDA4VM, TDA4AL, TDA4VL, TDA4VE, TDA4AEN, TDA4VEN, TDA4AEP, TDA4VL, TDA4VH

SDK’s impacted:

  • 9.0, 9.1, 9.2 for Edge AI Linux SDK using edgeai-tidl-tools repo and ADAS/PSDK-RTOS SDK using build-in tidl_tools
  • This is a known issue for 9.x SDK TIDL tooling.

    The cause is a system-level issue related to the Linux kernel of the host x86 PC used for compiling / importing the model with TIDL.

    9.X TIDL tools are affected by this issue, which causes a segmentation fault in a late phase of model-compilation.

    By default, most Linux systems (like Ubuntu, our main Linux platform for host-side development) have upgraded to the 6.8.0 Linux kernel or newer.

    • check your kernel with the command `uname -r`
    • For example, my machine upgraded to kernel 6.8.0-57-generic in 2H 2024 as part of a standard update

    These 9.x TIDL tools are validated against linux kernel 6.5.5.

     

    To resolve or work-around this issue, please consider the following options:

    • Downgrade the Linux kernel to 6.5.5
      • This has been validated – downgrading the Linux kernel caused this Segmentation Fault to stop happening
      • This will likely impact other aspects of your machine, such as graphics drivers.
      • A virtual machine is another alternative, but has not been validated
    • Upgrade your SDK to >9.2
      • SDK 10.0 and beyond resolves this issue – the root cause has been patched and included in newer tooling updates
      • Models will require recompilation
    • If you are on 9.2 SDK, use the backward-compatible 10_00_00_07 release/tag of edgeai-tidl-tools
      • This requires a script to run on the device / EVM to download updated firmwares and libraries

    Please see the following FAQ for more details on SDK versioning and TIDL: