This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: Conversion from .pth to .onnx . after training yolox_s_lite through edgeai-modelmaker

Part Number: TDA4VM

Tool/software:

I am training yolox_s_lite model on custom dataset using Edgeai-modelmaker for 200 epochs , but I have stopped the training for 103th epoch and now I want the .onnx model at the current epoch i.e. 103 . I have the epoch_103.pth file in the run section. How should that be converted to .onnx ??? 

And adding to above query, how could we resume the training again if we have stopped it in between (considering the training is being done through edgeai-modelmaker) .

And I tried train the yolox_s_lite model for single epoch through the same edgeai-modelmaker just for checking the output , I am getting the .onnx and .prototxt file but while compilation I am encountering with an error as below :

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/chai/edgeai-tensorlab/edgeai-modelmaker/data/projects/Merged_Dataset_Updated/run/20250813-144622/yolox_s_lite/compilation/work/od-8220/model/model.onnx failed:/root/onnxruntime/onnxruntime/core/graph/model.cc:149 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9

TASKS TOTAL=1, NUM_RUNNING=0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 4.48it/s, postfix={'RUNNING': [], 'COMPLETED': ['od-8220']}]
WARNING: Benchmark - completed: 0/1
TASKS TOTAL=1, NUM_RUNNING=0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.06s/it, postfix={'RUNNING': [], 'COMPLETED': ['od-8220']}]
INFO: packaging artifacts to /home/chai/edgeai-tensorlab/edgeai-modelmaker/data/projects/Merged_Dataset_Updated/run/20250813-144622/yolox_s_lite/compilation/pkg please wait...
WARNING:20250813-155953: could not package - /home/chai/edgeai-tensorlab/edgeai-modelmaker/data/projects/Merged_Dataset_Updated/run/20250813-144622/yolox_s_lite/compilation/work/od-8220
Traceback (most recent call last):
File "/home/chai/edgeai-tensorlab/edgeai-modelmaker/./scripts/run_modelmaker.py", line 153, in <module>
main(config)
File "/home/chai/edgeai-tensorlab/edgeai-modelmaker/./scripts/run_modelmaker.py", line 88, in main
model_runner.run()
File "/home/chai/edgeai-tensorlab/edgeai-modelmaker/edgeai_modelmaker/ai_modules/vision/runner.py", line 228, in run
self.model_compilation.run()
File "/home/chai/edgeai-tensorlab/edgeai-modelmaker/edgeai_modelmaker/ai_modules/vision/compilation/edgeai_benchmark.py", line 163, in run
edgeai_benchmark.interfaces.package_artifacts(self.settings, self.work_dir, out_dir=self.package_dir, custom_model=True)
File "/home/chai/edgeai-tensorlab/edgeai-benchmark/edgeai_benchmark/interfaces/run_package.py", line 271, in package_artifacts
with open(os.path.join(out_dir,'artifacts.yaml'), 'w') as fp:
FileNotFoundError: [Errno 2] No such file or directory: '/home/chai/edgeai-tensorlab/edgeai-modelmaker/data/projects/Merged_Dataset_Updated/run/20250813-144622/yolox_s_lite/compilation/pkg/artifacts.yaml'

  • Hi Chaitanya,

    This will take some research.  If you do not hear from me by Monday, please ping me.

    Regards,

    Chris

  • Hey Chris,

    Have you found out the solution regarding the query? 

    Regards,

    Chaitanya

  • Hi Chaitanya,

    Can you please send me the command line you are using to generate the model?  I can not make a determination from the output.  Also, can you run a "pip list" from your environment so I can see what version of modules you are running?

    Regards,

    Chris

  • Hi Chris,

    I am training yolox_s_lite from Edgeai-modelmaker using command line as below:

    " ./run_modelmaker.sh TDA4VM config_detection.yaml "

    After running pip list, the module versions I got was below:

    absl-py 2.1.0
    addict 2.4.0
    aenum 3.1.15
    aliyun-python-sdk-core 2.16.0
    aliyun-python-sdk-kms 2.16.5
    attrs 25.1.0
    autocfg 0.0.8
    cachetools 5.5.1
    caffe2onnx 1.0.2
    certifi 2025.1.31
    cffi 1.17.1
    charset-normalizer 3.4.1
    chumpy 0.70
    click 8.1.8
    cloudpickle 3.1.1
    colorama 0.4.6
    colored 2.3.0
    coloredlogs 15.0.1
    contourpy 1.3.1
    crcmod 1.7
    cryptography 44.0.1
    cycler 0.12.1
    Cython 3.0.12
    dataclasses 0.6
    debugpy 1.8.12
    decorator 5.1.1
    dill 0.3.9
    distro 1.9.0
    dlr 1.13.0
    edgeai_benchmark 10.1.4+626e8e5 /home/chai/edgeai-tensorlab/edgeai-benchmark
    edgeai_modelmaker 10.1.0+626e8e5 /home/chai/edgeai-tensorlab/edgeai-modelmaker
    edgeai_tensorvision 10.1.0+626e8e5 /home/chai/edgeai-tensorlab/edgeai-tensorvision
    edgeai-torchmodelopt 10.1.0 /home/chai/edgeai-tensorlab/edgeai-modeloptimization/torchmodelopt
    einops 0.8.1
    exceptiongroup 1.2.2
    filelock 3.14.0
    flatbuffers 1.12
    fonttools 4.56.0
    fsspec 2024.6.1
    gluoncv 0.10.5.post0
    google-auth 2.38.0
    google-auth-oauthlib 0.4.6
    graphviz 0.20.3
    grpcio 1.70.0
    h5py 3.12.1
    hf-xet 1.1.8
    huggingface-hub 0.34.4
    humanfriendly 10.0
    idna 3.10
    iniconfig 2.0.0
    Jinja2 3.1.4
    jmespath 0.10.0
    joblib 1.4.2
    json-tricks 3.17.3
    kiwisolver 1.4.8
    loguru 0.7.3
    Markdown 3.7
    markdown-it-py 3.0.0
    MarkupSafe 2.1.5
    matplotlib 3.10.0
    mdurl 0.1.2
    ml_dtypes 0.5.1
    mmcv 2.1.0
    mmdeploy 1.3.1 /home/chai/edgeai-tensorlab/edgeai-mmdeploy
    mmdet 3.2.0
    mmengine 0.10.6
    mmpose 1.3.1 /home/chai/edgeai-tensorlab/edgeai-mmpose
    model-index 0.1.11
    mpmath 1.3.0
    multiprocess 0.70.17
    munkres 1.1.4
    networkx 3.3
    ninja 1.11.1.3
    numpy 1.23.0
    nvidia-cublas-cu12 12.4.2.65
    nvidia-cuda-cupti-cu12 12.4.99
    nvidia-cuda-nvrtc-cu12 12.4.99
    nvidia-cuda-runtime-cu12 12.4.99
    nvidia-cudnn-cu12 9.1.0.70
    nvidia-cufft-cu12 11.2.0.44
    nvidia-curand-cu12 10.3.5.119
    nvidia-cusolver-cu12 11.6.0.99
    nvidia-cusparse-cu12 12.3.0.142
    nvidia-nccl-cu12 2.20.5
    nvidia-nvjitlink-cu12 12.4.99
    nvidia-nvtx-cu12 12.4.99
    oauthlib 3.2.2
    onnx 1.17.0
    onnx-graphsurgeon 0.3.26
    onnx-ir 0.1.1
    onnxruntime-tidl 1.15.0
    onnxscript 0.3.0
    onnxsim 0.4.35
    opencv-python 4.11.0.86
    opencv-python-headless 4.11.0.86
    opendatalab 0.0.10
    openmim 0.3.9
    openxlab 0.1.2
    ordered-set 4.1.0
    osrt_model_tools 1.2 /home/chai/edgeai-tidl-tools/osrt-model-tools
    oss2 2.17.0
    packaging 24.2
    pandas 2.2.3
    pillow 11.1.0
    Pillow-SIMD 9.5.0.post2
    pip 25.1.1
    platformdirs 4.3.6
    pluggy 1.5.0
    plyfile 1.1
    portalocker 3.2.0
    prettytable 3.14.0
    progiter 2.0.0
    progressbar 2.5
    protobuf 3.20.2
    psutil 7.0.0
    pyasn1 0.6.1
    pyasn1_modules 0.4.1
    pybind11 3.0.0
    pybind11-global 3.0.0
    pycocotools 2.0.8
    pycparser 2.22
    pycryptodome 3.21.0
    pydot 3.0.4
    Pygments 2.19.1
    pyparsing 3.2.1
    pytest 8.3.4
    python-dateutil 2.9.0.post0
    pytz 2023.4
    PyYAML 6.0.2
    requests 2.28.2
    requests-oauthlib 2.0.0
    rich 13.4.2
    rsa 4.9
    safetensors 0.6.2
    scikit-learn 1.6.1
    scipy 1.13.1
    setuptools 60.2.0
    shapely 2.0.7
    six 1.17.0
    sympy 1.13.1
    tabulate 0.9.0
    tensorboard 2.11.2
    tensorboard-data-server 0.6.1
    tensorboard-plugin-wit 1.8.1
    termcolor 2.5.0
    terminaltables 3.1.10
    tflite 2.18.0
    tflite-runtime 2.12.0
    thop 0.1.1.post2209072238
    threadpoolctl 3.5.0
    tidl_tools_package 10.1 /home/chai/edgeai-tensorlab/edgeai-benchmark
    timm 1.0.19
    tomli 2.2.1
    torch 2.4.0+cu124
    torchinfo 1.8.0
    torchvision 0.19.0+cu124
    tornado 6.4.2
    tqdm 4.65.2
    triton 3.0.0 1
    tvm 0.12.0
    typing_extensions 4.12.2
    tzdata 2025.1
    urllib3 1.26.20
    wcwidth 0.2.13
    Werkzeug 3.1.3
    wheel 0.45.1
    wurlitzer 3.1.1
    xtcocotools 1.14.3
    yacs 0.1.8
    yapf 0.43.0
    yolox 0.1.0 /home/chai/edgeai-yolox-main

    I think this satisfies your requirements. If not , please ping me.

    Regards,

    Chaitanya

  • Hi Chaitanya,

    The error you are getting is an ONNX error not a TIDL error. The import version of the tool you are using does not match the internal model version.  Try downgrading the version of the import tool to the version that matches the model.

    If you send me the model, any other files associated with he model you are using, import file,  inference file, input data and I will try it out and give more specific information.

    Regards,

    Chris

  • Hi Chris,

    Can you elaborate the above answer, which import tool you are talking about and and how it is affecting the internal pipeline of the edgeai-modelmaker such that it is not able to compile it after training?

    Adding to that my other two queries are still not answered which are mentioned below:

    1. I am training yolox_s_lite model on custom dataset using Edgeai-modelmaker for 200 epochs , but I have stopped the training for 103th epoch and now I want the .onnx model at the current epoch i.e. 103 . I have the epoch_103.pth file in the run section. How should that be converted to .onnx ??? 

    2. how could we resume the training again if we have stopped it in between (considering the training is being done through edgeai-modelmaker) .

    Thanks,

    Chaitanya