J784S4XEVM: Performance Measurement of Custom DNN Model

Part Number: J784S4XEVM


Tool/software:

I would like to deploy a custom object detection model on the board. I used OSRT Python to generate artifacts for the model (github.com/.../osrt_python). My goal is to test the on-board performance—for example, measuring inference latency, the number of DSPs utilized, overall utilization, and related metrics.

I found the perf_stats tool (github.com/.../perf_stats), which prints utilization data similar to nvidia-smi. However, the output is not smooth, and since each inference only takes a few milliseconds, I could not clearly interpret the performance results. I also tried increasing the refresh rate by reducing the sleep interval, but I still couldn’t fully capture the performance.

My questions are:

- Is there any configuration I need to change in order to get smoother statistics?

- Is there another method to capture performance and utilization information for custom models?

Thank you