PROCESSOR-SDK-J784S4: [J784S4 | TIDL 10.01] C7 DSP TopK operator slower than CPU

Ghassen souissi

Tool/software:

Hello,

I am working on a J784S4 device with RTOS SDK 10_01_00_04 and TIDL 10_01_00_01. The ONNX version I am using is 1.15.0.

I am using the TopK layer implementation provided in the TI SDK without any modifications.

In one of my models, I have a TopK operator with input shape float32[1,50,7168,1] and K=300. When running inference, I measured the following results:

On CPU: 8.13 ms
On C7 DSP: >1 second

To further investigate, I built a much smaller test model containing only a single TopK node with input shape float32[1,1,100,1] and K=30.(You can find this model here1665.topk_model.zip) For this case, I observed:

On CPU: 0.10 ms
On C7 DSP: 0.91 ms

Again, the inference time on C7 was noticeably higher than on CPU, which is the opposite of what I expected.

Could you please help me understand why the C7 DSP shows slower performance than the CPU for the TopK operator? Is this behavior related to the operator implementation in the SDK?

Do you also have any recommendations or solutions for improving the performance of TopK on C7?

Thank you in advance.

3 days ago

0 Chris Tsongas 3 days ago

TI__Genius 14510 points

Hi Ghassen,

How are you measuring the inference times? Is it really running on the C7x? I ask this because the C7x is ~30 faster than most PC CPU's.

Regards,

Chris

0 Ghassen souissi 2 days ago in reply to Chris Tsongas

Prodigy 120 points

Hello Chris,

Yes me too I was surprised because C7 should be faster than CPU.
I'm using get_TI_benchmark_data to get the inference times.

Could you please help me understand why the C7 DSP shows slower performance than the CPU for the TopK operator?

You can use my model to check the inference times.

0 Chris Tsongas 3 hours ago in reply to Ghassen souissi

TI__Genius 14510 points

Hi Ghassen,

Looking at your model, I do not think this is a valid test. It is a single layer model and the time taken up is from the moving data from ARM to C7x not in the execution of the TopK. A better test would be with more layers or load the model and run it 1000 time in a loop. I ran this and the single frame inference time was 0.00033 seconds or 2987.225799 FPS.

Version	Model	Total Inference time 1000 Cycles (s)	Single Frame Inference (ms)	FPS
10.01.04.00	topk_model.onnx	0.334759	0.000335	2987.2258

Regards,

Chris

Processors

Processors forum

PROCESSOR-SDK-J784S4: [J784S4 | TIDL 10.01] C7 DSP TopK operator slower than CPU