Hi,
I have some questions about the optimization of C6000 DSP.
1. I test the performance of 16-bit * 16-bit and 32-bit * 32-bit in C6678 and DM648, my code as follow:
short input_array1[256];
short input_array2[256];
int output_array[256];
for(i = 0; i < 256; i++)
{
input_array1[i] = i;
input_array2[i] = i + 10;
output_array[i] = 0;
}
TSCLBegin = TSCL;
#pragma MUST_ITERATE(256, 256, 8)
#pragma UNROLL(4)
for(j = 0; j < 256; j++)
{
output_array[i] = input_array1[j]*input_array2[j];
}
The performance of C6678 and DM648 as follow:
However in the datasheet, it said the multiply performance of C66x DSP is 4x C64x+ DSP.
And the MUST_ITERATE is no effect in performance.
Could you give me some advices about the optimization.
Thank you.
Tianxing