This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SK-AM68: How to run multiple models in parallel

Part Number: SK-AM68
Other Parts Discussed in Thread: AM68

Tool/software:

I use SDK_10.00.00.08 to do product defect detection based on SK-AM68,  Running multiple models with 1920*1080 resolution。
For example, if we run 4 models with single input, the inference time of each model is 20ms. When running multiple models (4 models), the total inference time is about 80ms after actual testing. Is it possible to run multiple models in parallel, that is, run 4 models with a total running time of 20~30ms? thanks

  • Hi;

    The AM68 is able to support multiple channel inputs; the inferencing time/performance will depend on your model or model's configuration, and memory speed. 

    The inferencing will be carried out as soon as a a frame data has been buffered. I think the memory bandwidth is key here. But I will check with my team for how to reduce your run time effectively. What is your frame rate requirement at 1080p resolution?

    Thanks and regards

    Wen Li  

  • Currently I am using 15fps at 3280*2464 resolution.

    The following description is tested at 1920*1080 resolution(15fps),  add the inference time of printing test before and after runModel,


    For example, if we run 4 models with single input, the inference time of each model is 20ms. When running multiple models (4 models), the total inference time is about 80ms after actual testing.

    The single-input multi-model test is actually executed serially. I want to know whether the hardware (AM68) supports multi-model parallel execution. If the hardware supports it, has the software already implemented it?

  • Hello,

    The hardware does not support multiple models running in parallel at the same time. Running multiple models works by context-switching within the accelerator by using the preemption feature. Preemption is enabled by default, to my knowledge.

    • My language is intentional here. Multiple models can be loaded and try to use the accelerator at the same time, but only one can be running at an instant in time. 
    • Default behavior will be that models run in the sequence their inferences were submitted

    Model context-switch can be at the entire model level or between layers of a model. You are seeing the former. The latter (layer-level) requires a setting "max_pre_empt_delay". The value is in milliseconds, and is effectively an upper limit on how long one model can run until it would be preempted

    BR,
    Reese

  • Thanks for your reply.