TDA4VM: Compilation Failure in edgeai-modelmaker 10.1r cpu

wang haofei

Prodigy 50 points

Part Number: TDA4VM

Tool/software:

I'm having problems with training and compilation:

Model used: yolox-s-lite (training logs attached)
run (1).log
Achieved >90% accuracy during training
However, when compiling the model, accuracy drops to 0% (compilation logs attached)
2821.run.log
The artifacts folder does contain the expected .bin files after CPU training

Could you help me identify what might be causing this issue? Thank you!

3 months ago

0 Christina Kuruvilla 3 months ago

TI__Expert 4670 points

Hi Wang,

I will attempt to recreate and get back to you as soon as I can.

Warm regards,

Christina

0 Rohit Rao 3 months ago

TI__Prodigy 210 points

Hi Wang,

Please send your exact onnx model for yolox-s-lite.
In the meantime do any of the other models from config_detection.yaml work for you (ie: yolox_nano, yolox_tiny)?

Best,

Rohit

0 wang haofei 3 months ago in reply to Rohit Rao

Prodigy 50 points

Hi Rohit,

Thanks for reaching out. The ONNX model for YOLOX-S-Lite is included in the complete set of files I generated using edgeai-modelmaker.yolox_s_lite.zip You should be able to find it within the directory. As for the other models listed in config_detection.yaml, I did try training with YOLOX-Nano, but unfortunately, the accuracy was also zero.

Let me know if you need any further details or if there's anything specific I can assist with regarding the models.

Best regards,

Wang

0 Christina Kuruvilla 3 months ago in reply to wang haofei

TI__Expert 4670 points

Thank you Wang.

We have been trying to recreate and will update you when we have more information.

Warm regards,

Christina

0 Rohit Rao 2 months ago in reply to wang haofei

TI__Prodigy 210 points

Hi Wang,

I have been working on recreating your issue. Can you give us some more information about how you setup your environment? How did you address installing dependencies for compilation of the model? Detailed step by step on what exactly you did will help me recreate your issue.

I'd like you to clarify whether you are running gpu or cpu setup here because your logs prove to be using GPU.

Best,

Rohit

0 wang haofei 2 months ago in reply to Rohit Rao

Prodigy 50 points

Hi Rohit,

Thank you for following up. Here are the detailed steps I used to set up my environment:

Fullscreen 6170.txt Download

absl-py                  2.3.0
addict                   2.4.0
aenum                    3.1.16
aliyun-python-sdk-core   2.16.0
aliyun-python-sdk-kms    2.16.5
attrs                    25.3.0
cachetools               5.5.2
certifi                  2025.4.26
cffi                     1.17.1
charset-normalizer       3.4.2
chumpy                   0.70
click                    8.2.1
cloudpickle              3.1.1
colorama                 0.4.6
colored                  2.3.0
coloredlogs              15.0.1
contourpy                1.3.2
crcmod                   1.7
cryptography             45.0.3
cycler                   0.12.1
Cython                   3.1.1
debugpy                  1.8.14
decorator                5.2.1
dill                     0.4.0
distro                   1.9.0
dlr                      1.13.0
edgeai_benchmark         10.1.4+626e8e5 /root/autodl-tmp/edgeai-tensorlab/edgeai-benchmark
edgeai_modelmaker        10.1.0+626e8e5 /root/autodl-tmp/edgeai-tensorlab/edgeai-modelmaker
edgeai_tensorvision      10.1.0+626e8e5 /root/autodl-tmp/edgeai-tensorlab/edgeai-tensorvision
edgeai-torchmodelopt     10.1.0         /root/autodl-tmp/edgeai-tensorlab/edgeai-modeloptimization/torchmodelopt
einops                   0.8.1
exceptiongroup           1.3.0
filelock                 3.14.0
flatbuffers              1.12
fonttools                4.58.0
fsspec                   2024.6.1
google-auth              2.40.2
google-auth-oauthlib     0.4.6
graphviz                 0.20.3
grpcio                   1.71.0
h5py                     3.13.0
humanfriendly            10.0
idna                     3.10
iniconfig                2.1.0
Jinja2                   3.1.4
jmespath                 0.10.0
joblib                   1.5.1
json-tricks              3.17.3
kiwisolver               1.4.8
loguru                   0.7.3
Markdown                 3.8
markdown-it-py           3.0.0
MarkupSafe               2.1.5
matplotlib               3.10.3
mdurl                    0.1.2
ml_dtypes                0.5.1
mmcv                     2.2.0
mmdeploy                 1.3.1          /root/autodl-tmp/edgeai-tensorlab/edgeai-mmdeploy
mmdet                    3.3.0          /root/autodl-tmp/edgeai-tensorlab/edgeai-mmdetection
mmengine                 0.10.7
mmpose                   1.3.1          /root/autodl-tmp/edgeai-tensorlab/edgeai-mmpose
model-index              0.1.11
mpmath                   1.3.0
multiprocess             0.70.18
munkres                  1.1.4
networkx                 3.3
ninja                    1.11.1.4
numpy                    1.23.0
nvidia-cublas-cu12       12.4.2.65
nvidia-cuda-cupti-cu12   12.4.99
nvidia-cuda-nvrtc-cu12   12.4.99
nvidia-cuda-runtime-cu12 12.4.99
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.0.44
nvidia-curand-cu12       10.3.5.119
nvidia-cusolver-cu12     11.6.0.99
nvidia-cusparse-cu12     12.3.0.142
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.4.99
nvidia-nvtx-cu12         12.4.99
oauthlib                 3.2.2
onnx                     1.14.0
onnx-graphsurgeon        0.3.26
onnxruntime-tidl         1.15.0
onnxscript               0.2.0
onnxsim                  0.4.35
opencv-python            4.11.0.86
opencv-python-headless   4.11.0.86
opendatalab              0.0.10
openmim                  0.3.9
openxlab                 0.1.2
ordered-set              4.1.0
osrt_model_tools         1.2
oss2                     2.17.0
packaging                24.2
pandas                   2.2.3
pillow                   11.2.1
Pillow-SIMD              9.5.0.post2
pip                      24.2
platformdirs             4.3.8
pluggy                   1.6.0
plyfile                  1.1
prettytable              3.16.0
progiter                 2.0.0
progressbar              2.5
protobuf                 3.20.2
psutil                   7.0.0
pyasn1                   0.6.1
pyasn1_modules           0.4.2
pycocotools              2.0.8
pycparser                2.22
pycryptodome             3.23.0
pydot                    4.0.0
Pygments                 2.19.1
pyparsing                3.2.3
pytest                   8.3.5
python-dateutil          2.9.0.post0
pytz                     2023.4
PyYAML                   6.0.2
requests                 2.28.2
requests-oauthlib        2.0.0
rich                     13.4.2
rsa                      4.9.1
scikit-learn             1.6.1
scipy                    1.10.0
setuptools               60.2.0
shapely                  2.1.1
six                      1.17.0
sympy                    1.13.3
tabulate                 0.9.0
tensorboard              2.11.2
tensorboard-data-server  0.6.1
tensorboard-plugin-wit   1.8.1
termcolor                3.1.0
terminaltables           3.1.10
tflite                   2.18.0
tflite-runtime           2.12.0
threadpoolctl            3.6.0
tidl_tools_package       10.1           /root/autodl-tmp/edgeai-tensorlab/edgeai-benchmark
tomli                    2.2.1
torch                    2.4.0+cu124
torchinfo                1.8.0
torchvision              0.19.0+cu124
tornado                  6.5.1
tqdm                     4.65.2
triton                   3.0.0
tvm                      0.12.0
typing_extensions        4.12.2
tzdata                   2025.2
urllib3                  1.26.20
wcwidth                  0.2.13
Werkzeug                 3.1.3
wheel                    0.45.1
wurlitzer                3.1.1
xtcocotools              1.14.3
yapf                     0.43.0

I created a virtual environment using pyenv with Python 3.10, activated it, and then installed dependencies by running the ./setup_gpu.sh script from the edgeai-modelmaker directory. Although this script targets GPU setup, I explicitly configured the training to run on CPU by setting num_gpus: 0 in the YAML file. During installation, I encountered a cyclic version conflict with onnxscript that appeared unresolvable, so I proceeded by ignoring it, assuming it wouldn’t critically impact the workflow.

Best regards,
Wang

0 Rohit Rao 2 months ago in reply to wang haofei

TI__Prodigy 210 points

Hi Wang,

Thanks for the details. I'll try to recreate the environment and troubleshoot the setup. I'll get back to you soon with my findings.

Best,
Rohit

0 wang haofei 2 months ago in reply to Rohit Rao

Prodigy 50 points

Hi Rohit,

I hope you're doing well. I wanted to follow up on the environment recreation and troubleshooting you mentioned. I'd appreciate any updates you can provide.

Best, Wang

0 Rohit Rao 2 months ago in reply to wang haofei

TI__Prodigy 210 points

Hello Wang,

I have tried to reproduce your error and I believe there are multiple underlying issues with the modelmaker stack. We will look towards solving these issues with development team but due to their bandwidth, it is likely that your issues with modelmaker will take time to be resolved.

As an alternative I have created a training script for your custom data with YOLO 9 and under architectures. You can use the below python notebook with any python environment whether you use conda, or just your pyenv environment.

Once you have your trained and exported onnx model, you can use edgeai-tidltools to handle the compilation for the device you have.

edgeai-tidltools repository: github.com/.../

Please let me know if you need any assistance with this new workflow.

YOLOX is currently not supported by edgeai-tidltools as of our latest release. Hence, we suggest you to use yolo 9 and under.

Note: Training for more epochs with your data will likely increase the average precision. I also recommend you follow the exact COCO dataset setup like so:

Example File Structure:

## Root Directory

* annotations: This directory contains the annotation files for the dataset.

+ instances_train2017.json: Instance annotations for the training set (2017).

+ instances_val2017.json: Instance annotations for the validation set (2017).

* images: This directory contains the image files for the dataset.

+ train2017: Training images (2017).

+ val2017: Validation images (2017).

+ test2017: Test images (2017).

As for your other E2E maybe try using this as an alternative solution.

e2e.ti.com/.../tda4vm-edgeai-modelmaker-compile-issue-invalid-layer-name-error-on-tiscapes2017_driving-model-0-accuracy

https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/7178.trainYOLO.ipynb

0 wang haofei 2 months ago in reply to Rohit Rao

Prodigy 50 points

Hello Rohit,

Thank you for your prompt response and for creating an alternative training script for me. I appreciate your efforts to address the issues with the modelmaker stack, and I understand that it might take some time for the development team to resolve them due to their bandwidth.

I will definitely try out the new training script you've provided, and I'll use the edgeai-tidltools for compilation as you suggested.

Please keep me updated if there are any changes or improvements made to the modelmaker. I'd be eager to try it out once it's ready.

Thanks again for your help and support.

Best regards, Wang

0 Rohit Rao 2 months ago in reply to wang haofei

TI__Prodigy 210 points

Sounds good, we'll keep you updated on modelmaker.

Best,

Rohit

Processors

Processors forum

TDA4VM: Compilation Failure in edgeai-modelmaker 10.1r cpu