This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM62A7: I compile thd customed Yolov5 model ,I would like to ask if the inference time taken by the model is normal.

Part Number: AM62A7

Hello , now the SDK version is  9.1, and I compiled the yolov5s6 model  with the input size as 640*640 and 640*384, and the infer time of 640*640 (width * height ) model  is nearly 60 ms ,  the infer time of  640*384 model  nearly 40 ms ,so i  ,I would like to ask if the inference time taken by the model is normal.

  • Hello,

    These inference times sound reasonable to me given the complexity of the model. We do not have publicly available benchmarks for this model on AM62A, that I can share as a reference. You can see finer-grain information by using the get_TI_benchmark_data() function call on the runtime object to see timestamps for the start and end of different operations: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/5dbf1a3fa3aa9736a6cc84eaacbc3cd5e11ebc74/examples/osrt_python/ort/onnxrt_ep.py#L74

    May I ask what are your expectations?

    Best Regards,
    Reese

  • Hi Reese, If the computing power is 2T, I guess the inference time should be shorter. I directly use the demo for testing. This demo will directly return the inference time and total time and display them. Is this inference time the model inference time?

  • The left picture is the result of inference. It seems that the inference result is correct. The right picture is the inference time taken by the demo. The input of the model (ONNX) is (640,384)

  • Hello,

    yes, the dl-inference time is how long it took inference to run the model. This is captured at the user-level with timestamps before and after the call to the model. The source is here for reference: https://github.com/TexasInstruments/edgeai-gst-apps/blob/7b8756f65c59701e403b0a149e41eb41d43956ea/apps_python/infer_pipe.py#L109

    Note that if you are using the SDK provided without modifications, you may not be seeing 100% of the device performance. The Linux SDK configuration uses a default of 850 MHz for the C7xMMA (AI accelerator) instead of 1 GHz based on a hardware configuration of older boards.

    • You can check your board version in near the USB-A port -- there is a string like PROC1352E3. What matters is E2 vs E3. E3 will be stable at the higher clock speed whereas E2 may show stability issues for a long-running test or higher temperature. Both can be set to the maximum frequency.
    • Enable maximum performance by applying the device tree overlay in the uEnv.txt file in the BOOT partition:
      • add a line: name_overlays=k3-am62a7-sk-e3-max-opp.dtbo
      • If you have multiple overlays, the file names should be space separated
      • The line above works for the 9.1 SDK. For 9.0, add "ti/" in front of the filename
      • This sets the max C7xMMA clock to 1 GHz, A53 clocks to 1.4 GHz, and disables frequency scaling in Linux
    • You can check the C7x clock speed with the following:
      • k3conf dump clock 211 #the frequencies are in Hz
    • And set the clock similarly
      • k3conf set clock 211 1000000000 #set C7 to 1 GHz for 2 TOPS performance - this gives 10-15% performance boost

    I have noticed that performance can be variable if this overlay is not used -- this is due to power modes affecting interrupt latency to the A53 cores running Linux. This reduces performance simply because the inference call does not return to the Python/CPP process soon enough, even though inference on the accelerator already completed. You can see detailed timestamps of operations here with the get_TI_benchmark_data() function called on the OSRT runtime.

    -Reese