SK-AM68: How to run multiple models in parallel

csscyt

Part Number: SK-AM68
Other Parts Discussed in Thread: AM68

Tool/software:

I use SDK_10.00.00.08 to do product defect detection based on SK-AM68, Running multiple models with 1920*1080 resolution。
For example, if we run 4 models with single input, the inference time of each model is 20ms. When running multiple models (4 models), the total inference time is about 80ms after actual testing. Is it possible to run multiple models in parallel, that is, run 4 models with a total running time of 20~30ms? thanks

4 months ago

0 Wen Li 4 months ago

TI__Expert 7221 points

Hi;

The AM68 is able to support multiple channel inputs; the inferencing time/performance will depend on your model or model's configuration, and memory speed.

The inferencing will be carried out as soon as a a frame data has been buffered. I think the memory bandwidth is key here. But I will check with my team for how to reduce your run time effectively. What is your frame rate requirement at 1080p resolution?

Thanks and regards

Wen Li

0 csscyt 4 months ago in reply to Wen Li

Prodigy 210 points

Currently I am using 15fps at 3280*2464 resolution.

The following description is tested at 1920*1080 resolution（15fps)， add the inference time of printing test before and after runModel，

For example, if we run 4 models with single input, the inference time of each model is 20ms. When running multiple models (4 models), the total inference time is about 80ms after actual testing.

The single-input multi-model test is actually executed serially. I want to know whether the hardware (AM68) supports multi-model parallel execution. If the hardware supports it, has the software already implemented it?

0 Reese Grimsley 4 months ago in reply to csscyt

TI__Genius 15666 points

Hello,

The hardware does not support multiple models running in parallel at the same time. Running multiple models works by context-switching within the accelerator by using the preemption feature. Preemption is enabled by default, to my knowledge.

My language is intentional here. Multiple models can be loaded and try to use the accelerator at the same time, but only one can be running at an instant in time.
Default behavior will be that models run in the sequence their inferences were submitted

Model context-switch can be at the entire model level or between layers of a model. You are seeing the former. The latter (layer-level) requires a setting "max_pre_empt_delay". The value is in milliseconds, and is effectively an upper limit on how long one model can run until it would be preempted

https://github.com/TexasInstruments/edgeai-tidl-tools/blob/a082497a359f078e89dd11e3ae0745b9490bbb56/examples/osrt_python/common_utils.py#L104
- Documentation on this feature is a thin
This is also paired with a 'priority' option, usually. Priorities are serviced in the same was as FreeRTOS
Generally, enabling preemption will reduce overall performance but higher-priority models can be serviced more quickly
- context switching is never free!

BR,
Reese

0 csscyt 4 months ago in reply to Reese Grimsley

Prodigy 210 points

Thanks for your reply.

Processors

Processors forum

SK-AM68: How to run multiple models in parallel