Part Number: TMDSEVM572X
Hi,everyone
I run the dgemm example on my device AM572x, I set BLIS_IC_NT=2, so the calculate will run on 2-ARM cores,
this example consume time on 2-ARM and 2-DSP are 0.435s and 0.64s when matrix size M=N=K=1024.
This is the output file:
1263.dgemm_time_ARM_2_cores.dat
I've checked the performance for M=N=K=1000, it is 0.786s on 2-ARM and 0.55s on 2-DSP as same as TI said
http://www.ti.com/processors/dsp/libraries/linear-algebra.html?keyMatch=cblas&tisearch=Search-EN-Everything
it is amazing that matrix size(1024) is more big but consume less time while running on 2-ARM.
Dose it show that the ARM has better performance than DSP while running LINALG(dgemm) ?
besides, this is the output file when example run on 1-ARM.