AM62A7-Q1: Object Detection Inaccuracy When Input Frame Rate Exceeds Output Frame Rate

howie zhai

Part Number: AM62A7-Q1
Other Parts Discussed in Thread: AM62A7, SK-AM62A-LP

Tool/software:

Hi, TI experts,

I’m encountering challenges with object detection using the YOLOX-S model on an AM62A7 board and would like to seek advice. Here’s the context:

When processing a 30fps input video, the system’s output frame rate is limited to ~20fps, leading to noticeable misalignment in detected bounding boxes (the boxes fail to track moving vehicles accurately). However, with a 10fps input, the output frame rate matches exactly, and the bounding boxes align correctly.

Key question:
When the input frame rate exceeds the output frame rate (e.g., 30fps input vs. 20fps output), are there specific optimizations or configuration adjustments—such as frame dropping strategies, model quantization, inference pipeline tuning, or hardware resource allocation—that can improve detection accuracy, even if the output frame rate does not fully match the input?

I’m particularly interested in methods to ensure precise bounding box localization despite the frame rate mismatch. Any insights or technical suggestions would be highly valuable!

Thanks.

SDK: 10.01.00.05

Hardware Platform: SK-AM62A-LP

3 months ago

+1 Reese Grimsley 3 months ago

TI__Genius 15156 points

Hello Howie,

The YOLOX-S model will indeed run more slowly than a 30 FPS stream. I expect running another variant like YOLOX-tiny or -nano will not experience this behavior.

Frame-drops are the most-likely cause for this. How are you running the pipeline? Optiflow or apps_python/apps_cpp?

The right strategy here would be to measure the achievable FPS for the model -- this will be the critical stage of the pipeline that limits performance. From here, I suggest the application should drop frames to match this expected framerate (in gstreamer, videorate plugin with drop-only=True) before the AI model will run.

What's happening is that the postprocessing is using a newer frame than what inference ran on. The flow in apps_python/cpp applications is that we:

pull the tensor from GST,
run inference
then pull the original frame from GST.
Source: https://github.com/TexasInstruments/edgeai-gst-apps/blob/cd08868eb31a18d4ee3c533079b64ee167977443/apps_python/infer_pipe.py#L94

The original frame that corresponds to inference could have dropped during the long inference call, before it was pulled in by the application through appsink.

Your query is fairly open on strategies to mitigate this. The selected model for TIDL inference should be able to match the inter-frame latency of your camera source, OR you should only call inference at a rate that the accelerator can handle (if that means dropping frames, you should do so).

Some use-cases will run multiple models are different rates, but if you have a single model, it is best to use a model with inference-time <= inter-frame latency. Otherwise, dropped frames are simply wasting CPU and DDR utilization.

BR,
Reese

Processors

Processors forum

AM62A7-Q1: Object Detection Inaccuracy When Input Frame Rate Exceeds Output Frame Rate