This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
when i compile the model by tvm,and get the artifacts_**_target,include the deploy_lib.so and subgraph0_net.
and add the code as below on the 'test_tidl_j7_deploy.py'.
and run on evm ,get the time cost is below.
but the model run on C71/MMA(except the not support layer op) just cost about ~10ms.
so that I doubt it is accurate.
and i would like to know how to test the whole process time cost? how to get the partial time on arm core, optimization by tvm ?
thanks for any advice.
root@j7-evm:/opt/tvm/tests/python/relay/ti_tests#
root@j7-evm:/opt/tvm/tests/python/relay/ti_tests# python3 test_tidl_j7_deploy.py
Input image file: ./test.jpg
2020-11-19 18:13:14,606 INFO Could not find libdlr.so in model artifact. Using dlr from /usr/lib/python3.8/site-packages/dlr/libdlr.so
2020-11-19 18:13:14,606 INFO Could not find libdlr.so in model artifact. Using dlr from /usr/lib/python3.8/site-packages/dlr/libdlr.so
[18:13:14] ../src/dlr_tvm.cc:66: No metadata found
APP: Init ... !!!
MEM: Init ... !!!
MEM: Initialized DMA HEAP (fd=6) !!!
MEM: Init ... Done !!!
IPC: Init ... !!!
IPC: Init ... Done !!!
REMOTE_SERVICE: Init ... !!!
REMOTE_SERVICE: Init ... Done !!!
APP: Init ... Done !!!
248.343436 s: VX_ZONE_INIT:Enabled
248.343460 s: VX_ZONE_ERROR:Enabled
248.343465 s: VX_ZONE_WARNING:Enabled
248.346302 s: VX_ZONE_INIT:[tivxInit:71] Initialization Done !!!
248.348719 s: VX_ZONE_INIT:[tivxHostInit:48] Initialization Done for HOST !!!
[18:13:15] ../src/dlr.cc:162: No metadata file was found!
time: 0.5211689472198486 second
ONNX_LaneDet dlr runtime execution finished
ONNX_LaneDet execution finished
[[[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]]]
root@j7-evm:/opt/tvm/tests/python/relay/ti_tests#
The first inference call will do library load and parameters initializations. So would take more time them rest of the call.
So, run the inferences for ~100 frames and average to get performance.
We also have also added API for DLR model object "get_TI_benchmark_data" to get Subgraph level performncae and Total DDR BW.
Refer example usage in "ti_dl/test/tflrt/tflrt_delegate.py". same API is available for TFlite/ONNXRT/DLR runtimes