SK-AM68: edgeai-modelmaker:Training failed with GPU enabled

Part Number: SK-AM68
Other Parts Discussed in Thread: AM68, AM68A

Tool/software:

I use AM68 SDK10.1, and use NVIDIA GeForce RTX 5070,
Since this GPU requires CUDA 12.8+,
I’ve installed CUDA 12.8 and set up my environment with PyTorch (2.7.1+cu128)
Refer to this post, 
https://github.com/lllyasviel/Fooocus/issues/4088


U
nder  PyTorch (2.7.1+cu128), enable GPU( num_gpus 1), These two sample scripts can be trained and compiled using GPU,
./run_modelmaker.sh AM68A config_classification.yaml
 ./run_modelmaker.sh AM68A config_segmentation.yaml

But running the object detection script training fails with GPU,

 ./run_modelmaker.sh AM68A config_detection.yaml
Number of AVX cores detected in PC: 32
AVX compilation speedup in PC     : 1
Target device                     : AM68A
PYTHONPATH                        : .:
TIDL_TOOLS_PATH                   : ../edgeai-benchmark/tools/tidl_tools_package/AM68A/tidl_tools
LD_LIBRARY_PATH                   : ../edgeai-benchmark/tools/tidl_tools_package/AM68A/tidl_tools:
argv: ['./scripts/run_modelmaker.py', 'config_detection_new.yaml', '--target_device', 'AM68A']
---------------------------------------------------------------------
INFO: ModelMaker - task_type:detection model_name:yolox_s_lite dataset_name:tiscapes2017_driving run_name:20250916-155615/yolox_s_lite
- Model: yolox_s_lite
- TargetDevices & Estimated Inference Times (ms): {'TDA4VM': 10.14, 'AM62A': 43.94, 'AM67A': '43.94 (with 1/2 device capability)', 'AM68A': 10.22, 'AM69A': '9.82 (with 1/4th device capability)'}
- This model can be compiled for the above device(s).
---------------------------------------------------------------------
INFO: ModelMaker - dataset split sizes {'train': 393, 'val': 107}
INFO: ModelMaker - max_num_files is set to: 10000
INFO: ModelMaker - dataset split sizes are limited to: {'train': 393, 'val': 107}
INFO: ModelMaker - dataset loading OK
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
INFO: ModelMaker - run params is at: /home/github/edgeai-tensorlab/edgeai-modelmaker/data/projects/tiscapes2017_driving/run/20250916-155615/yolox_s_lite/run.yaml
INFO: ModelMaker - running training - for detailed info see the log file: /home/github/edgeai-tensorlab/edgeai-modelmaker/data/projects/tiscapes2017_driving/run/20250916-155615/yolox_s_lite/training/run.log
TASKS TOTAL=1, NUM_RUNNING=1:   0%|                                                   | 0/1 [00:00<?, ?it/s, postfix={'RUNNING': ['20250916-155615/yolox_s_lite:training'], 'COMPLETED': []}]
ERROR:20250916-155618: Error occurred: 20250916-155615/yolox_s_lite:training - Error Code: 1 at /home/xilutek/github/edgeai-tensorlab/edgeai-benchmark/edgeai_benchmark/utils/parallel_runner.py
TASKS TOTAL=1, NUM_RUNNING=0: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.81s/it, postfix={'RUNNING': [], 'COMPLETED': ['yolox_s_lite']}]
Trained model is at: /home/github/edgeai-tensorlab/edgeai-modelmaker/data/projects/tiscapes2017_driving/run/20250916-155615/yolox_s_lite/training

WARNING: ModelMaker - Training completed with errors.


run.log,  how to fix it? thanks

 cat run.log
Traceback (most recent call last):
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/tools/train.py", line 23, in <module>
    from mmdeploy.utils import save_model_proto
  File "/home/github/edgeai-tensorlab/edgeai-mmdeploy/mmdeploy/__init__.py", line 4, in <module>
    from mmdeploy.utils import get_root_logger
  File "/home/github/edgeai-tensorlab/edgeai-mmdeploy/mmdeploy/utils/__init__.py", line 7, in <module>
    from .utils import get_file_path, get_root_logger, target_wrapper, build_model_from_cfg
  File "/home/github/edgeai-tensorlab/edgeai-mmdeploy/mmdeploy/utils/utils.py", line 15, in <module>
    from mmdet.apis import init_detector
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/apis/__init__.py", line 2, in <module>
    from .det_inferencer import DetInferencer
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/apis/det_inferencer.py", line 22, in <module>
    from mmdet.evaluation import INSTANCE_OFFSET
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/evaluation/__init__.py", line 4, in <module>
    from .metrics import *  # noqa: F401,F403
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/evaluation/metrics/__init__.py", line 5, in <module>
    from .coco_metric import CocoMetric
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/evaluation/metrics/coco_metric.py", line 16, in <module>
    from mmdet.datasets.api_wrappers import COCO, COCOeval, COCOevalMP
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/datasets/__init__.py", line 31, in <module>
    from .utils import get_loading_pipeline
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/datasets/utils.py", line 5, in <module>
    from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/datasets/transforms/__init__.py", line 6, in <module>
    from .formatting import (ImageToTensor, PackDetInputs, PackReIDInputs,
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/datasets/transforms/formatting.py", line 11, in <module>
    from mmdet.structures.bbox import BaseBoxes
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/structures/bbox/__init__.py", line 2, in <module>
    from .base_boxes import BaseBoxes
  File "/home/xilutek/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/structures/bbox/base_boxes.py", line 9, in <module>
    from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/structures/mask/__init__.py", line 3, in <module>
    from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
  File "/home/github/edgeai-tensorlab/edgeai-mmdetection/mmdet/structures/mask/structures.py", line 12, in <module>
    from mmcv.ops.roi_align import roi_align
  File "/home/.pyenv/versions/py310/lib/python3.10/site-packages/mmcv/ops/__init__.py", line 3, in <module>
    from .active_rotated_filter import active_rotated_filter
  File "/home/.pyenv/versions/py310/lib/python3.10/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
    ext_module = ext_loader.load_ext(
  File "/home/.pyenv/versions/py310/lib/python3.10/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/home/.pyenv/versions/3.10.18/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/.pyenv/versions/py310/lib/python3.10/site-packages/mmcv/_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs

WARNING: ModelMaker - Training completed with errors.@ubuntu2204:~/github/edgeai-tensorlab/edgeai-modelmaker/data/projects/tiscapes2017_driving/run/20250916-155615/yolox_s_lite/training$