This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Unexpectedly high measured cycle counts for DSPlib matrix call

Other Parts Discussed in Thread: CCSTUDIO

Hi,

I am looking at performance of matrix operations and seem to have
a 2.5 to 5 time longer execution time than expected when calling the DSP lib function
DSPF_sp_lud:

My setup:
CCStudio 6.0.1.00040
DSPlib c66x 3.4.0.0
Running on a K2H12 (EVMK2H evaluation board from Advantech)

Essential code:
#include <c6x.h>
...
// A float array, allocated on stack,
// L, U float pointers, space allocayed on heap (malloc)
// P unsigned short, space allocsted on heap (malloc)

TSCL = 0;
DSPF_sp_lud(64, &A, L, U, P);
cycles = TSCL;

When run once, the cycles value reported is about 1.88 million which is quite
a difference from the 718676 cycles reported in the test report of the library
for order-64 matrices.
Running the above code inside a loop will actually make subsequent runs SLOWER
at about 3.23 million cycles.

So the simple question is - what is wrong?

I realize that the cycle counts cited in the test report comes from running the
lib in the CCS5 simulator, but that should hardly explain a difference in running time
of over 100%...?

On another note - With a clock frequency of 1.2 GHz and a reported peak performance
of 19.2 GFlops we look at 16 instructions /cycle. This is of course a top theoretical
number - what performance could reasonably be expected from the DSPlib in real life?

Regards
/Anders Klint