Tool/software:
Hello,
I am working on a J784S4 device with RTOS SDK 10_01_00_04 and TIDL 10_01_00_01. The ONNX version I am using is 1.15.0.
I am using the TopK layer implementation provided in the TI SDK without any modifications.
In one of my models, I have a TopK operator with input shape float32[1,50,7168,1]
and K=300. When running inference, I measured the following results:
-
On CPU: 8.13 ms
-
On C7 DSP: >1 second
To further investigate, I built a much smaller test model containing only a single TopK node with input shape float32[1,1,100,1]
and K=30.(You can find this model here1665.topk_model.zip) For this case, I observed:
-
On CPU: 0.10 ms
-
On C7 DSP: 0.91 ms
Again, the inference time on C7 was noticeably higher than on CPU, which is the opposite of what I expected.
Could you please help me understand why the C7 DSP shows slower performance than the CPU for the TopK operator? Is this behavior related to the operator implementation in the SDK?
Do you also have any recommendations or solutions for improving the performance of TopK on C7?
Thank you in advance.