This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-AM62A-LP: Issue with Training YOLOX using Model Maker on GPU

Part Number: SK-AM62A-LP

Hello,

I am reaching out because I am encountering an error while attempting to train a YOLOX with Model Maker on my GPU.

I am receiving the following error message:

AttributeError: DataContainer has no attribute size for type <class 'list'> ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1)

I noticed that this issue has been reported by others on your GitHub repository (https://github.com/TexasInstruments/edgeai-modelmaker/issues/11).

Is there any workaround available to address this error and proceed with training using my GPU ?

Thank you for your attention to this issue.

Best regards

  • I have the same issue. I just tried turning off distributed training (probably only valid for a single GPU solution). It doesn't look like it's using my GPU yet, but it's at least not throwing the error. Maybe it will all the way work on your setup. 

    edgeai-modelmaker/edgeai-modelmaker/ai-modules/vision/params.py

    training=dict(distributed=False)

  • Thank you Derek.

    I've managed to resolve my issue by updating MMCV. I use MMCV 1.7.2 and changed the version constraint in edge-mmdetection/mmdet/__init__.py

    mmcv_maximum_version = '1.5.0'

    to

    mmcv_maximum_version = '1.7.2'

    Then I was able to train using two GPUs without throwing any error.
    I hope this will help you as well.

  • That works for me, thank you!