Part Number: TDA4VM
Hi all,
I am experimenting with the TIDL automated mixed precision algorithm for TDA4VM board. I use EdgeAI TIDL Tools version 10_01_04_00 and TI provided cl-ort-resnet18-v1model with mixed_precision_factor parameter set to 1.3 (tolerated 30% increase of latency compared to 8-bit inference). However, the algorithm fails during the quantization process with the following error:
-------- Running Calibration in Float Mode to Collect Tensor Statistics --------
Segmentation fault (core dumped)
[TIDL Import] ERROR: Failed to run calibration pass, system command returned error: 35584 -- [tidl_import_core.cpp, 678]
[TIDL Import] ERROR: Failed to run Calibration - Failed in function: tidlRunQuantStatsTool -- [tidl_import_core.cpp, 1746]
[TIDL Import] [QUANTIZATION] ERROR: - Failed in function: TIDL_quantStatsFixedOrFloat -- [tidl_import_quantize.cpp, 3969]
[TIDL Import] ERROR: - Failed in function: TIDL_executeAutomatedMixedPrecision -- [tidl_import_core.cpp, 4037]
[TIDL Import] ERROR: - Failed in function: TIDL_import_backend -- [tidl_import_core.cpp, 4419]
[TIDL Import] ERROR: - Failed in function: TIDL_runtimesPostProcessNet -- [tidl_runtimes_import_common.cpp, 1414]
[TIDL Import] [PARSER] ERROR: - Failed in function: TIDL_subgraphImport -- [tidl_onnxRtImport_EP.cpp, 1737]
[TIDL Import] [PARSER] ERROR: - Failed in function: TIDL_computeInvokeFunc -- [tidl_onnxRtImport_EP.cpp, 2511]
I traced down the issue to the point where PC_dsp_test_dl_algo.out tool is invoked to collect the statistics for model quantization. This tool experience a segmentation fault after calling the function getLayerIdToExecute() in c7x-mma-tidl/ti_dl/algo/src/tidl_alg.c. This function returns the layerId which is out of bounds so afterwards invalid data is accessed. I suspect that network binary file, which is used by the tool, somehow gets corrupted, but I am not sure where and why. The tricky part is that this is a random behavior, so sometimes the algorithm runs without the issues. I tried it with other (customer specific) models and observed the same behavior, so I do not think it is an issue with the model.
I am using the Docker environment as recommended in the EdgeAI TIDL Tools documentation, so I sippose that potential i environment issues can be ruled out.
Did someone observed the similar behavior on their end and do you maybe know how to address it?
Note: I also tried to run the model qunatization on a more recent EdgeAI TIDL Tools version (11_01_06_00), but the issue is the same.
Best regards,
Mladen