This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Hello , now the SDK version is 9.1, and I compiled the yolov5s6 model with the input size as 640*640 and 640*384, and the infer time of 640*640 (width * height ) model is nearly 60 ms , the infer time of 640*384 model nearly 40 ms ,so i ,I would like to ask if the inference time taken by the model is normal.
Hello,
These inference times sound reasonable to me given the complexity of the model. We do not have publicly available benchmarks for this model on AM62A, that I can share as a reference. You can see finer-grain information by using the get_TI_benchmark_data() function call on the runtime object to see timestamps for the start and end of different operations: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/5dbf1a3fa3aa9736a6cc84eaacbc3cd5e11ebc74/examples/osrt_python/ort/onnxrt_ep.py#L74
May I ask what are your expectations?
Best Regards,
Reese
Hi Reese, If the computing power is 2T, I guess the inference time should be shorter. I directly use the demo for testing. This demo will directly return the inference time and total time and display them. Is this inference time the model inference time?
The left picture is the result of inference. It seems that the inference result is correct. The right picture is the inference time taken by the demo. The input of the model (ONNX) is (640,384)
Hello,
yes, the dl-inference time is how long it took inference to run the model. This is captured at the user-level with timestamps before and after the call to the model. The source is here for reference: https://github.com/TexasInstruments/edgeai-gst-apps/blob/7b8756f65c59701e403b0a149e41eb41d43956ea/apps_python/infer_pipe.py#L109
Note that if you are using the SDK provided without modifications, you may not be seeing 100% of the device performance. The Linux SDK configuration uses a default of 850 MHz for the C7xMMA (AI accelerator) instead of 1 GHz based on a hardware configuration of older boards.
I have noticed that performance can be variable if this overlay is not used -- this is due to power modes affecting interrupt latency to the A53 cores running Linux. This reduces performance simply because the inference call does not return to the Python/CPP process soon enough, even though inference on the accelerator already completed. You can see detailed timestamps of operations here with the get_TI_benchmark_data() function called on the OSRT runtime.
-Reese