This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMDSEVM572X: DGEMM example performance

Part Number: TMDSEVM572X

Hi,everyone

 I run the dgemm example on my device AM572x, I set BLIS_IC_NT=2, so the calculate will run on 2-ARM cores,

 this example consume time on 2-ARM and 2-DSP are 0.435s and 0.64s when matrix size M=N=K=1024.

This is the output file:

1263.dgemm_time_ARM_2_cores.dat 

7776.dgemm_time_DSP.dat

I've checked the performance for M=N=K=1000, it is 0.786s on 2-ARM and 0.55s on 2-DSP as same as TI said 

http://www.ti.com/processors/dsp/libraries/linear-algebra.html?keyMatch=cblas&tisearch=Search-EN-Everything

it is amazing that matrix size(1024) is more big but consume less time while running on 2-ARM.

Dose it show that the ARM has better performance than DSP while running LINALG(dgemm) ?

besides, this is the output file when example run on 1-ARM.

1200.dgemm_time_ARM_1_core.dat