Tool/software:
I would like to deploy a custom object detection model on the board. I used OSRT Python to generate artifacts for the model (github.com/.../osrt_python). My goal is to test the on-board performance—for example, measuring inference latency, the number of DSPs utilized, overall utilization, and related metrics.
I found the perf_stats tool (github.com/.../perf_stats), which prints utilization data similar to nvidia-smi. However, the output is not smooth, and since each inference only takes a few milliseconds, I could not clearly interpret the performance results. I also tried increasing the refresh rate by reducing the sleep interval, but I still couldn’t fully capture the performance.
My questions are:
- Is there any configuration I need to change in order to get smoother statistics?
- Is there another method to capture performance and utilization information for custom models?
Thank you