This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VH-Q1: TIDL Import & Quantization Issues with Neuflow Optical Flow Model– Invalid Output & QDQ Import Failure

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: TDA4VH, AM69A

Tool/software:

Hi TI Support,

I'm working on deploying a feature extractor to the TDA4VH using TIDL with C7x DSP acceleration. The model runs correctly on CPU using:

providers = ["CPUExecutionProvider"]

However, when switching to:
providers = ["TIDLCompilationProvider", "CPUExecutionProvider"]

I encounter either TIDL compilation errors or the output becomes unusable—either all zeros or NaNs, depending on the setup.

White check mark Working Configuration

  • Model: ONNX (opset 18), running with FP32 on CPU

  • Output: Accurate

X Failing Configurations (with DSP)

I’ve tested the following compilation and quantization setups using compile.py (based on onnxrt_ep.py from edgeai-tidl-tools):

  1. QDQ quantized model (via quantize_static from ONNX)

    • X Fails at compile time

    • TIDL import error: "missing inputs", topological sort failure (see below)

  2. Quantization during TIDL compilation

    • X Segfaults at TIDL compilation (session.run() with inference_mode:0)

    • With a warning:
      Conv node failure: Name '/custom_backbone/block1_2/conv1/Conv'
      Status Message: Input channels C is not equal to kernel channels * group.
      C: 1 kernel channels: 3 group:

      After this warning, the process segfaults immediately

    • White check mark I solved the Conv parameters (kernel_size=11, stride=4, padding=5), allowing the model to compile.
      However, the output range is now enormous (-3.5e+33 to +2.6e+33), suggesting a quantization or numeric overflow issue.

QDQ Import Failure Details

When attempting to compile the QDQ-format model (option 1), I get this TIDL import error:

[TIDL Import] [PARSER] ERROR: Layer 53, custom_backbone.block1_dd.conv_block.conv1.weight_DequantizeLinear_Output/duplicated:custom_backbone.block1_dd.conv_block.conv1.weight_DequantizeLinear_Output/duplicated is missing inputs in the network and cannot be topologically sorted. Missing inputs are: 
  -- [tidl_import_common.cpp, 4378]
  Input 0: custom_backbone.block1_dd.conv_block.conv1.weight_DequantizeLinear_Output/duplicated, dataId=116
[TIDL Import]  ERROR: - Failed in function: tidl_optimizeNet -- [tidl_import_core.cpp, 2602]
[TIDL Import]  ERROR: Network Optimization failed - Failed in function: TIDL_runtimesOptimizeNet -- [tidl_runtimes_import_common.cpp, 1287]

It appears TIDL cannot resolve a duplicated or reused DequantizeLinear node, which seems to be inserted during ONNX QDQ export.

Environment

  • Platform: TDA4VH

  • Python: 3.10.12

  • ONNX opset: 18

  • Quantization: onnxruntime.quantization.quantize_static

  • TIDL tools commit: 1b75e86e79cfddb8f6e181014e6343e89765883d

  • Compilation: Based on edgeai-tidl-tools/onnxrt_ep.py

  • Calibration: performed using 100 test images in both options (quantize_static and TIDL quantization)

Question Key Questions

  1. Are there known TIDL quantization or import issues with QDQ models (especially with reused weights or shared DequantizeLinear nodes)?

  2. What causes the import error above, and how can I structure QDQ graphs to avoid this?

  3. Does this kernel + stride combination require special handling in calibration?

  4. Is there a known cause for this enormous output range after quantization?

Available for Debugging

I can provide the following artifacts if helpful:

  • ONNX models

  • Layer execution summary (tempDir)

  • Calibration data + output examples

Any help would be greatly appreciated. Please let me know what additional files or logs you’d like to see.

Thanks in advance!

Victoria

  • Hi Victoria,

    Excellent detail on this post.  Which version of TIDL are you running?  Can you please provide the QDQ model,  command line you used to import/run and configuration files.  The first step is to duplicate your results (or errors), and we can debug from there.

    Regards,
    Chris

  • Hi Chris,

    Thanks for the quick response.

    I'm using TIDL Tools version 10.1.0.0 on the AM69A platform (branch: rel_10_01, commit: 1b75e86e79cfddb8f6e181014e6343e89765883d).

    I'm attaching the following files to help you replicate my setup:

    QDQ ONNX model – The pre-quantized model I'm compiling.

    compile.py script – Contains the model import logic and TIDL delegate configuration. The optimizer step is currently commented out for debugging purposes, but feel free to enable and optimize it if needed. The script supports two modes for input during compilation: it uses real optical flow image pairs if test_img_path is provided, and otherwise falls back to generating dummy float32 inputs with the correct shape and value range. Since the model is already quantized (QDQ), dummy input should be sufficient for compilation.

    Let me know if you need anything else.

    Thanks again,
    Victoria

    tidl_debug_files.zip

  • Hi Victoria,

    Looking at your model, this does not look right. There are no inputs for the DequantizeLinear node.    Is this expected in the model?

    custom_backbone.block1_dd.conv_block.conv1.weight_DequantizeLinear

    Regards,

    Chris

  • Hi Chris,

    Thanks for your input.

    Regarding the observation that the DequantizeLinear node appears to have no inputs: this is expected in ONNX when the input is a quantized weight initializer. In Netron, initializers are shown as implicit inputs and not as connected nodes, but the model is structurally valid.

    As for the error during TIDL import:

    [TIDL Import] [PARSER] ERROR: Layer 53, ... is missing inputs in the network and cannot be topologically sorted.

    I’ve investigated this further and created a minimal example to isolate the issue. It appears the error is triggered when
    a Conv node is executed on the CPU, but a downstream DequantizeLinear node (connected to the Conv) is placed in the
    TIDL subgraph. In this case, the model fails to compile with the topological sort error mentioned above.

    However, if the DequantizeLinear node is also executed on the CPU, compilation succeeds. This suggests the issue occurs when TIDL attempts to process a DequantizeLinear node that references weights to a CPU-executed Conv.

    I’ve attached two example graphs to illustrate this:

    I’ve attached a combined image (combined.png) showing both configurations:

    • Top: Working configuration (Conv and DequantizeLinear on CPU)

    • Bottom: Failing configuration (DequantizeLinear in TIDL subgraph)

    In summary, it appears that TIDL currently struggles with duplicated or shared nodes when they are split between the CPU and DSP subgraphs. I can share the minimal ONNX model if that would help with debugging.

    That said, while this example highlights one specific issue, it doesn't resolve the problems in the full model. Simply forcing DequantizeLinear to run on the CPU isn't a general solution. This seems to expose one part of a broader problem, and further investigation is definitely needed.

    Best regards,
    Victoria

  • Hi Victoria,

    There is probably an issue with TIDL, and I can put in a Jira for a fix with the Dev team.  In the meantime, can you try placing the offending layers in the deny list to verify?  I want to ensure there is a workaround for your application and validate your theory before adding the Jira.

    Thanks,

    Chris

  • Hi Chris,

    Thanks for the suggestion.

    Yes, I can remove all the Dequantize layers, which seem to be the root of the issue—especially when they appear in duplicate. However, doing so results in most of the remaining nodes being isolated in separate TIDL blocks. This quickly leads to reaching the maximum number of TIDL blocks, and as a result, most of the model ends up being executed on the CPU.

    As I mentioned earlier, compiling with FP32 for CPU execution works without any issues, but unfortunately, it doesn't meet my inference time requirements. So while the workaround technically functions, it's not viable for my application's performance targets.

    Let me know how you'd like to proceed.

    Best regards,
    Victoria

  • Hi Victoria,

    Can you please send me the minimal ONNX model to demonstrate this?  I would need to show this to the Dev team.  

    Regards,

    Chris

  • Hi Chris,

    Yes, I can send the minimal ONNX model that reproduces the issue for the dev team to look at.

    I’ve partially worked around the problem by quantizing the model using ONNX’s quantize_static with DedicatedQDQPair=True. This prevents DequantizeLinear nodes from being shared between multiple consumers, which TIDL has trouble handling. However, this setting doesn’t apply to Dequantize nodes created from initializer weights, which can still have multiple consumers. When that happens, I get the following TIDL error:

    [TIDL Import] [PARSER] ERROR: Unable to merge Dequantize - /custom_backbone/block1_1/relu_2/LeakyRelu_output_0_DequantizeLinear_Output_1 upwards - DQ without initializer? -- [tidl_import_common.cpp, 7103]

    To demonstrate both cases, I’m attaching two minimal models:

    1. QDQmodel_debug1.onnx:

      • Compiles successfully as-is.

      • Fails if I deny /custom_backbone/block1_1/conv2/Conv, with the topological sort error.

      • Compiles again if I also deny /custom_backbone/block1_1/conv2.weight_DequantizeLinear.

    2. QDQmodel_debug2.onnx:

      • Quantized with DedicatedQDQPair=True.

      • Fails with the “DQ without initializer?” error due to a shared DequantizeLinear node not tied to an initializer.

    Let me know if this helps validate the behavior and workaround. I appreciate your help in raising a Jira for this, and happy to provide anything else needed.

    Best regards,
    Victoria

    QDQmodel_debug.zip

  • Hi Victoria,

    I can get both models to import as regular models (non-QDQ).  Instead of me trying a bunch of different TIDL configurations, can you please send me your import and inference config files?

    Thanks,

    Chris

  • Hi Chris,

    Thanks for the update.

    Just to clarify the context: my goal is to compile and deploy a quantized (QDQ) model, as it’s intended to run on hardware that requires quantization. While I understand that importing the model as a regular (non-QDQ) model may work, this doesn’t align with the intended deployment.

    In the compile script I shared earlier, all the import and compile parameters are included. I don’t have separate import or inference config files beyond that. Also, inference configuration isn’t relevant at this point—I’m currently focused on resolving the import/compilation issue during local testing on my desktop.

    Could you help clarify what might be causing the errors related to missing inputs and Dequantize nodes without initializers? Any insight on how to move forward with compiling the quantized model would be appreciated.

    Thanks again for your help.

    Best regards,
    Victoria

  • Hi Victoria,

    Apologies for the late reply.  What I mean by import configuration files is either the import.txt files if you are using TIDLRT or the entry in model_configs.py if you are using OSRT.   What I want to see is how you are attempting an import into TIDL.  The compile.py script you sent is not working due to missing image files and path failures.  I want to eliminate all variable factors, so I would like to test with the TIDL tools from the Development team.

    :
    FileNotFoundError: [Errno 2] No such file or directory: '/home/emibr12/neuflow/low/00001_img1.ppm'
    ************ in TIDL_subgraphRtDelete ************

    Regards,

    CHris 

  • Hi Victoria,

    I have been able to replicate the issue with a toy QDQ model.  I entered a Jira for this issue: TIDL-7827.  The development team has been assigned to it.

    Regards,

    Chris

  • Hi Victoria,

    I have a possible solution for your model.  I was able to test with your compile script, and it worked up to the point of loading the images. However, since I do not have your .ppm file, which was required for the process, it did not work.

    To fix the model, you first need to clone https://github.com/daquexian/onnx-simplifier

    cd to onnx-simplifier

    onnxsim QDQmodel.onnx QDQmodel_simp.onnx

    Then modify your compile script to use the new simplified model name. 

    Regards,

    Chris

    2112.compile.txt