Hi,
I am looking at performance of matrix operations and seem to have
a 2.5 to 5 time longer execution time than expected when calling the DSP lib function
DSPF_sp_lud:
My setup:
CCStudio 6.0.1.00040
DSPlib c66x 3.4.0.0
Running on a K2H12 (EVMK2H evaluation board from Advantech)
Essential code:
#include <c6x.h>
...
// A float array, allocated on stack,
// L, U float pointers, space allocayed on heap (malloc)
// P unsigned short, space allocsted on heap (malloc)
TSCL = 0;
DSPF_sp_lud(64, &A, L, U, P);
cycles = TSCL;
When run once, the cycles value reported is about 1.88 million which is quite
a difference from the 718676 cycles reported in the test report of the library
for order-64 matrices.
Running the above code inside a loop will actually make subsequent runs SLOWER
at about 3.23 million cycles.
So the simple question is - what is wrong?
I realize that the cycle counts cited in the test report comes from running the
lib in the CCS5 simulator, but that should hardly explain a difference in running time
of over 100%...?
On another note - With a clock frequency of 1.2 GHz and a reported peak performance
of 19.2 GFlops we look at 16 instructions /cycle. This is of course a top theoretical
number - what performance could reasonably be expected from the DSPlib in real life?
Regards
/Anders Klint