TDA4VM: Segfault when running automated mixed precision algorithm

Part Number: TDA4VM

Hi all,

I am experimenting with the TIDL automated mixed precision algorithm for TDA4VM board. I use EdgeAI TIDL Tools version 10_01_04_00 and TI provided cl-ort-resnet18-v1model with mixed_precision_factor parameter set to 1.3 (tolerated 30% increase of latency compared to 8-bit inference). However, the algorithm fails during the quantization process with the following error:

-------- Running Calibration in Float Mode to Collect Tensor Statistics --------
Segmentation fault (core dumped)
[TIDL Import]  ERROR: Failed to run calibration pass, system command returned error: 35584 -- [tidl_import_core.cpp, 678]
[TIDL Import]  ERROR: Failed to run Calibration - Failed in function: tidlRunQuantStatsTool -- [tidl_import_core.cpp, 1746]
[TIDL Import] [QUANTIZATION] ERROR:  - Failed in function: TIDL_quantStatsFixedOrFloat -- [tidl_import_quantize.cpp, 3969]
[TIDL Import]  ERROR:  - Failed in function: TIDL_executeAutomatedMixedPrecision -- [tidl_import_core.cpp, 4037]
[TIDL Import]  ERROR:  - Failed in function: TIDL_import_backend -- [tidl_import_core.cpp, 4419]
[TIDL Import]  ERROR:  - Failed in function: TIDL_runtimesPostProcessNet -- [tidl_runtimes_import_common.cpp, 1414]
[TIDL Import] [PARSER] ERROR:  - Failed in function: TIDL_subgraphImport -- [tidl_onnxRtImport_EP.cpp, 1737]
[TIDL Import] [PARSER] ERROR:  - Failed in function: TIDL_computeInvokeFunc -- [tidl_onnxRtImport_EP.cpp, 2511]

I traced down the issue to the point where PC_dsp_test_dl_algo.out tool is invoked to collect the statistics for model quantization. This tool experience a segmentation fault after calling the function getLayerIdToExecute() in c7x-mma-tidl/ti_dl/algo/src/tidl_alg.c. This function returns the layerId which is out of bounds so afterwards invalid data is accessed. I suspect that network binary file, which is used by the tool, somehow gets corrupted, but I am not sure where and why. The tricky part is that this is a random behavior, so sometimes the algorithm runs without the issues. I tried it with other (customer specific) models and observed the same behavior, so I do not think it is an issue with the model.

I am using the Docker environment as recommended in the EdgeAI TIDL Tools documentation, so I sippose that potential i environment issues can be ruled out.

Did someone observed the similar behavior on their end and do you maybe know how to address it?

Note: I also tried to run the model qunatization on a more recent EdgeAI TIDL Tools version (11_01_06_00), but the issue is the same.

Best regards,

Mladen

  • Hi, I have debugged and made a change. Can you try with following tools 10_01_04_00
    3162.tidl_tools.tar.gz

    Regards

    Vaibhav

  • Vaibhav Kumar,

    Thanks for addressing the issue and coming back with the fix.

    We also managed to debug and find a fix. On our end, we memset sTIDL_Network_t structure in function TIDL_executeAutomatedMixedPrecision() located in c7x-mma-tidl/ti_dl/utils/tidlModelImport/tidl_import_core.cpp. We also fixed the part which release the previously allocated resources (even though this seems not to impact the crash). I attach here the diff with all the changes for your reference. Can you confirm that this is also a change in your fix?

    I'll also try with your binaries and come back with the results.

    Regards,

    Mladen

  • Yes, same as yours, I also zero initialized sTIDL_Network_t structure in the same function. I don't see any diff attached.

    Regards

  • Sorry, I forgot to attach it. Here it is.

    diff -ruN a/c7x-mma-tidl/ti_dl/utils/tidlModelImport/tidl_import_core.cpp b/c7x-mma-tidl/ti_dl/utils/tidlModelImport/tidl_import_core.cpp
    --- a/c7x-mma-tidl/ti_dl/utils/tidlModelImport/tidl_import_core.cpp	2024-12-12 17:18:42.000000000 +0100
    +++ b/c7x-mma-tidl/ti_dl/utils/tidlModelImport/tidl_import_core.cpp	2026-04-24 08:28:53.986258000 +0200
    @@ -3945,6 +3945,7 @@
     
       strcpy(inConfigFilename, TIDL_augmentCharArrayWithSuffix(inConfigFileNameOrig, "_float").c_str());
       sTIDL_Network_t * tidlNetStructureFloat = new sTIDL_Network_t;
    +  memset(tidlNetStructureFloat, 0, sizeof(sTIDL_Network_t));
       TIDL_updateConfigParameters(&gParams,-1,-1,-1,-1,-1,gParams.numFramesBiasCalibration/4);
       gParams.writeTraceLevel = 3;
       TIDL_IMPORT_CHECK_AND_RETURN(TIDL_quantStatsFixedOrFloat(&orgTIDLNetStructure,
    @@ -4330,8 +4331,13 @@
     
           /* Execute the algorithm */
           TIDL_IMPORT_CHECK_AND_RETURN(TIDL_executeAutomatedMixedPrecision(layerIndex, orgTIDLNetStructureOrig, &configParamsOrig), "");
    +      
    +      TIDL_freeModelParams(orgTIDLNetStructureOrig, orgTIDLNetStructure.numLayers);
     
    -      delete orgTIDLNetStructureOrig;
    +      if ( orgTIDLNetStructureOrig != NULL )
    +      {
    +        delete orgTIDLNetStructureOrig;
    +      }
         }
         /* Needs review on when exactly we want to abort if this function fails */
         TIDL_importBitDepthProtoTxt(&orgTIDLNetStructure, &gParams);
    

  • I also tried your pre-built files and I confirm it fixes the issue.

  • Changes look fine to me. If you have futher issues, you may create new ticket.