This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Compilation Failure in edgeai-modelmaker 10.1r cpu

Part Number: TDA4VM

Tool/software:

I'm having problems with training and compilation:

  • Model used: yolox-s-lite (training logs attached)

  • run (1).log

  • Achieved >90% accuracy during training

  • However, when compiling the model, accuracy drops to 0% (compilation logs attached)

  • 2821.run.log

  • The artifacts folder does contain the expected .bin files after CPU training

Could you help me identify what might be causing this issue? Thank you!

  • Hi Wang,

    I will attempt to recreate and get back to you as soon as I can. 

    Warm regards,

    Christina

  • Hi Wang,


    Please send your exact onnx model for yolox-s-lite.
    In the meantime do any of the other models from config_detection.yaml work for you (ie: yolox_nano, yolox_tiny)?

    Best,

    Rohit

  • Hi Rohit,
    Thanks for reaching out. The ONNX model for YOLOX-S-Lite is included in the complete set of files I generated using edgeai-modelmaker.yolox_s_lite.zip You should be able to find it within the directory. As for the other models listed in config_detection.yaml, I did try training with YOLOX-Nano, but unfortunately, the accuracy was also zero.
    Let me know if you need any further details or if there's anything specific I can assist with regarding the models.
    Best regards,
    Wang
  • Thank you Wang.

    We have been trying to recreate and will update you when we have more information.

    Warm regards,

    Christina

  • Hi Wang,

     

    I have been working on recreating your issue. Can you give us some more information about how you setup your environment? How did you address installing dependencies for compilation of the model? Detailed step by step on what exactly you did will help me recreate your issue.
     

    I'd like you to clarify whether you are running gpu or cpu setup here because your logs prove to be using GPU.

     

    Best,

    Rohit

  • Hi Rohit,

    Thank you for following up. Here are the detailed steps I used to set up my environment:

    absl-py                  2.3.0
    addict                   2.4.0
    aenum                    3.1.16
    aliyun-python-sdk-core   2.16.0
    aliyun-python-sdk-kms    2.16.5
    attrs                    25.3.0
    cachetools               5.5.2
    certifi                  2025.4.26
    cffi                     1.17.1
    charset-normalizer       3.4.2
    chumpy                   0.70
    click                    8.2.1
    cloudpickle              3.1.1
    colorama                 0.4.6
    colored                  2.3.0
    coloredlogs              15.0.1
    contourpy                1.3.2
    crcmod                   1.7
    cryptography             45.0.3
    cycler                   0.12.1
    Cython                   3.1.1
    debugpy                  1.8.14
    decorator                5.2.1
    dill                     0.4.0
    distro                   1.9.0
    dlr                      1.13.0
    edgeai_benchmark         10.1.4+626e8e5 /root/autodl-tmp/edgeai-tensorlab/edgeai-benchmark
    edgeai_modelmaker        10.1.0+626e8e5 /root/autodl-tmp/edgeai-tensorlab/edgeai-modelmaker
    edgeai_tensorvision      10.1.0+626e8e5 /root/autodl-tmp/edgeai-tensorlab/edgeai-tensorvision
    edgeai-torchmodelopt     10.1.0         /root/autodl-tmp/edgeai-tensorlab/edgeai-modeloptimization/torchmodelopt
    einops                   0.8.1
    exceptiongroup           1.3.0
    filelock                 3.14.0
    flatbuffers              1.12
    fonttools                4.58.0
    fsspec                   2024.6.1
    google-auth              2.40.2
    google-auth-oauthlib     0.4.6
    graphviz                 0.20.3
    grpcio                   1.71.0
    h5py                     3.13.0
    humanfriendly            10.0
    idna                     3.10
    iniconfig                2.1.0
    Jinja2                   3.1.4
    jmespath                 0.10.0
    joblib                   1.5.1
    json-tricks              3.17.3
    kiwisolver               1.4.8
    loguru                   0.7.3
    Markdown                 3.8
    markdown-it-py           3.0.0
    MarkupSafe               2.1.5
    matplotlib               3.10.3
    mdurl                    0.1.2
    ml_dtypes                0.5.1
    mmcv                     2.2.0
    mmdeploy                 1.3.1          /root/autodl-tmp/edgeai-tensorlab/edgeai-mmdeploy
    mmdet                    3.3.0          /root/autodl-tmp/edgeai-tensorlab/edgeai-mmdetection
    mmengine                 0.10.7
    mmpose                   1.3.1          /root/autodl-tmp/edgeai-tensorlab/edgeai-mmpose
    model-index              0.1.11
    mpmath                   1.3.0
    multiprocess             0.70.18
    munkres                  1.1.4
    networkx                 3.3
    ninja                    1.11.1.4
    numpy                    1.23.0
    nvidia-cublas-cu12       12.4.2.65
    nvidia-cuda-cupti-cu12   12.4.99
    nvidia-cuda-nvrtc-cu12   12.4.99
    nvidia-cuda-runtime-cu12 12.4.99
    nvidia-cudnn-cu12        9.1.0.70
    nvidia-cufft-cu12        11.2.0.44
    nvidia-curand-cu12       10.3.5.119
    nvidia-cusolver-cu12     11.6.0.99
    nvidia-cusparse-cu12     12.3.0.142
    nvidia-nccl-cu12         2.20.5
    nvidia-nvjitlink-cu12    12.4.99
    nvidia-nvtx-cu12         12.4.99
    oauthlib                 3.2.2
    onnx                     1.14.0
    onnx-graphsurgeon        0.3.26
    onnxruntime-tidl         1.15.0
    onnxscript               0.2.0
    onnxsim                  0.4.35
    opencv-python            4.11.0.86
    opencv-python-headless   4.11.0.86
    opendatalab              0.0.10
    openmim                  0.3.9
    openxlab                 0.1.2
    ordered-set              4.1.0
    osrt_model_tools         1.2
    oss2                     2.17.0
    packaging                24.2
    pandas                   2.2.3
    pillow                   11.2.1
    Pillow-SIMD              9.5.0.post2
    pip                      24.2
    platformdirs             4.3.8
    pluggy                   1.6.0
    plyfile                  1.1
    prettytable              3.16.0
    progiter                 2.0.0
    progressbar              2.5
    protobuf                 3.20.2
    psutil                   7.0.0
    pyasn1                   0.6.1
    pyasn1_modules           0.4.2
    pycocotools              2.0.8
    pycparser                2.22
    pycryptodome             3.23.0
    pydot                    4.0.0
    Pygments                 2.19.1
    pyparsing                3.2.3
    pytest                   8.3.5
    python-dateutil          2.9.0.post0
    pytz                     2023.4
    PyYAML                   6.0.2
    requests                 2.28.2
    requests-oauthlib        2.0.0
    rich                     13.4.2
    rsa                      4.9.1
    scikit-learn             1.6.1
    scipy                    1.10.0
    setuptools               60.2.0
    shapely                  2.1.1
    six                      1.17.0
    sympy                    1.13.3
    tabulate                 0.9.0
    tensorboard              2.11.2
    tensorboard-data-server  0.6.1
    tensorboard-plugin-wit   1.8.1
    termcolor                3.1.0
    terminaltables           3.1.10
    tflite                   2.18.0
    tflite-runtime           2.12.0
    threadpoolctl            3.6.0
    tidl_tools_package       10.1           /root/autodl-tmp/edgeai-tensorlab/edgeai-benchmark
    tomli                    2.2.1
    torch                    2.4.0+cu124
    torchinfo                1.8.0
    torchvision              0.19.0+cu124
    tornado                  6.5.1
    tqdm                     4.65.2
    triton                   3.0.0
    tvm                      0.12.0
    typing_extensions        4.12.2
    tzdata                   2025.2
    urllib3                  1.26.20
    wcwidth                  0.2.13
    Werkzeug                 3.1.3
    wheel                    0.45.1
    wurlitzer                3.1.1
    xtcocotools              1.14.3
    yapf                     0.43.0

    I created a virtual environment using pyenv with Python 3.10, activated it, and then installed dependencies by running the ./setup_gpu.sh script from the edgeai-modelmaker directory. Although this script targets GPU setup, I explicitly configured the training to run on CPU by setting num_gpus: 0 in the YAML file. During installation, I encountered a cyclic version conflict with onnxscript that appeared unresolvable, so I proceeded by ignoring it, assuming it wouldn’t critically impact the workflow.

    Best regards,
    Wang 

  • Hi Wang,

    Thanks for the details. I'll try to recreate the environment and troubleshoot the setup. I'll get back to you soon with my findings.

    Best,
    Rohit

  • Hi Rohit,
    I hope you're doing well. I wanted to follow up on the environment recreation and troubleshooting you mentioned.  I'd appreciate any updates you can provide.
    Best, Wang
  • Hello Wang,

    I have tried to reproduce your error and I believe there are multiple underlying issues with the modelmaker stack. We will look towards solving these issues with development team but due to their bandwidth, it is likely that your issues with modelmaker will take time to be resolved.

    As an alternative I have created a training script for your custom data with YOLO 9 and under architectures. You can use the below python notebook with any python environment whether you use conda, or just your pyenv environment.

    Once you have your trained and exported onnx model, you can use edgeai-tidltools to handle the compilation for the device you have.

    edgeai-tidltools repository: github.com/.../

    Please let me know if you need any assistance with this new workflow.
     

    YOLOX is currently not supported by edgeai-tidltools as of our latest release. Hence, we suggest you to use yolo 9 and under.

    Note: Training for more epochs with your data will likely increase the average precision. I also recommend you follow the exact COCO dataset setup like so:

    Example File Structure:

    ## Root Directory

    * annotations: This directory contains the annotation files for the dataset.

        + instances_train2017.json: Instance annotations for the training set (2017).

        + instances_val2017.json: Instance annotations for the validation set (2017).

    * images: This directory contains the image files for the dataset.

        + train2017: Training images (2017).

        + val2017: Validation images (2017).

        + test2017: Test images (2017).

    As for your other E2E maybe try using this as an alternative solution. 

    e2e.ti.com/.../tda4vm-edgeai-modelmaker-compile-issue-invalid-layer-name-error-on-tiscapes2017_driving-model-0-accuracy

    https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/791/7178.trainYOLO.ipynb

  • Hello Rohit,
    Thank you for your prompt response and for creating an alternative training script for me. I appreciate your efforts to address the issues with the modelmaker stack, and I understand that it might take some time for the development team to resolve them due to their bandwidth.
    I will definitely try out the new training script you've provided, and I'll use the edgeai-tidltools for compilation as you suggested. 
    Please keep me updated if there are any changes or improvements made to the modelmaker. I'd be eager to try it out once it's ready.
    Thanks again for your help and support.
    Best regards, Wang
  • Sounds good, we'll keep you updated on modelmaker.

    Best,

    Rohit