AM62A7: I compile thd customed Yolov5 model ,I would like to ask if the inference time taken by the model is normal.

zhuangyh

Part Number: AM62A7

Hello , now the SDK version is 9.1, and I compiled the yolov5s6 model with the input size as 640*640 and 640*384, and the infer time of 640*640 (width * height ) model is nearly 60 ms , the infer time of 640*384 model nearly 40 ms ,so i ,I would like to ask if the inference time taken by the model is normal.

over 1 year ago

0 Reese Grimsley over 1 year ago

TI__Genius 15156 points

Hello,

These inference times sound reasonable to me given the complexity of the model. We do not have publicly available benchmarks for this model on AM62A, that I can share as a reference. You can see finer-grain information by using the get_TI_benchmark_data() function call on the runtime object to see timestamps for the start and end of different operations: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/5dbf1a3fa3aa9736a6cc84eaacbc3cd5e11ebc74/examples/osrt_python/ort/onnxrt_ep.py#L74

May I ask what are your expectations?

Best Regards,
Reese

0 zhuangyh over 1 year ago in reply to Reese Grimsley

Prodigy 150 points

Hi Reese, If the computing power is 2T, I guess the inference time should be shorter. I directly use the demo for testing. This demo will directly return the inference time and total time and display them. Is this inference time the model inference time?

0 zhuangyh over 1 year ago in reply to Reese Grimsley

Prodigy 150 points

The left picture is the result of inference. It seems that the inference result is correct. The right picture is the inference time taken by the demo. The input of the model (ONNX) is (640,384)

0 Reese Grimsley over 1 year ago in reply to zhuangyh

TI__Genius 15156 points

Hello,

yes, the dl-inference time is how long it took inference to run the model. This is captured at the user-level with timestamps before and after the call to the model. The source is here for reference: https://github.com/TexasInstruments/edgeai-gst-apps/blob/7b8756f65c59701e403b0a149e41eb41d43956ea/apps_python/infer_pipe.py#L109

Note that if you are using the SDK provided without modifications, you may not be seeing 100% of the device performance. The Linux SDK configuration uses a default of 850 MHz for the C7xMMA (AI accelerator) instead of 1 GHz based on a hardware configuration of older boards.

You can check your board version in near the USB-A port -- there is a string like PROC1352E3. What matters is E2 vs E3. E3 will be stable at the higher clock speed whereas E2 may show stability issues for a long-running test or higher temperature. Both can be set to the maximum frequency.
Enable maximum performance by applying the device tree overlay in the uEnv.txt file in the BOOT partition:
- add a line: name_overlays=k3-am62a7-sk-e3-max-opp.dtbo
- If you have multiple overlays, the file names should be space separated
- The line above works for the 9.1 SDK. For 9.0, add "ti/" in front of the filename
- This sets the max C7xMMA clock to 1 GHz, A53 clocks to 1.4 GHz, and disables frequency scaling in Linux
You can check the C7x clock speed with the following:
- k3conf dump clock 211 #the frequencies are in Hz
And set the clock similarly
- k3conf set clock 211 1000000000 #set C7 to 1 GHz for 2 TOPS performance - this gives 10-15% performance boost

I have noticed that performance can be variable if this overlay is not used -- this is due to power modes affecting interrupt latency to the A53 cores running Linux. This reduces performance simply because the inference call does not return to the Python/CPP process soon enough, even though inference on the accelerator already completed. You can see detailed timestamps of operations here with the get_TI_benchmark_data() function called on the OSRT runtime.

-Reese

Processors

Processors forum

AM62A7: I compile thd customed Yolov5 model ,I would like to ask if the inference time taken by the model is normal.