Hello!
We are moving our product from Analog's Blackfin BF533 to TI's DM6446, and have performed some PIO memory benchmarks, and the result were worse than expected for the DaVinci:
(2) READ
Read ( Buffer = 1048576 B, Iteration = 10 ): 87.381 MB/s
Read ( Buffer = 524288 B, Iteration = 20 ): 87.381 MB/s
Read ( Buffer = 262144 B, Iteration = 40 ): 87.381 MB/s
Read ( Buffer = 131072 B, Iteration = 80 ): 87.381 MB/s
Read ( Buffer = 65536 B, Iteration = 160 ): 87.381 MB/s
Read ( Buffer = 32768 B, Iteration = 320 ): 87.381 MB/s
Read ( Buffer = 16384 B, Iteration = 640 ): 87.381 MB/s
Read ( Buffer = 8192 B, Iteration = 1280 ): 149.797 MB/s
Read ( Buffer = 4096 B, Iteration = 2560 ): 149.797 MB/s
Read ( Buffer = 2048 B, Iteration = 5120 ): 149.797 MB/s
Read ( Buffer = 1024 B, Iteration = 10240 ): 131.072 MB/s
Read ( Buffer = 512 B, Iteration = 20480 ): 149.797 MB/s
Read ( Buffer = 256 B, Iteration = 40960 ): 149.797 MB/s
Read ( Buffer = 128 B, Iteration = 81920 ): 149.797 MB/s
Read ( Buffer = 64 B, Iteration = 163840 ): 131.072 MB/s
Read ( Buffer = 32 B, Iteration = 327680 ): 131.072 MB/s
Read ( Buffer = 16 B, Iteration = 655360 ): 131.072 MB/s
Read ( Buffer = 8 B, Iteration = 1310720 ): 104.858 MB/s
Read ( Buffer = 4 B, Iteration = 2621440 ): 87.381 MB/s
And the results for the BF-533:
CPU: 500 Mhz
SDR: 133 MHz
BUS width= 16 bit
(2) READ
Read ( Buffer = 1048576 B, Iteration = 10 ): 183.916 MB/s
Read ( Buffer = 524288 B, Iteration = 20 ): 184.163 MB/s
Read ( Buffer = 262144 B, Iteration = 40 ): 184.164 MB/s
Read ( Buffer = 131072 B, Iteration = 80 ): 184.162 MB/s
Read ( Buffer = 65536 B, Iteration = 160 ): 184.161 MB/s
Read ( Buffer = 32768 B, Iteration = 320 ): 1063.946 MB/s
Read ( Buffer = 16384 B, Iteration = 640 ): 1098.820 MB/s
Read ( Buffer = 8192 B, Iteration = 1280 ): 1098.525 MB/s
Read ( Buffer = 4096 B, Iteration = 2560 ): 1097.187 MB/s
Read ( Buffer = 2048 B, Iteration = 5120 ): 1094.575 MB/s
Read ( Buffer = 1024 B, Iteration = 10240 ): 1089.335 MB/s
Read ( Buffer = 512 B, Iteration = 20480 ): 1078.901 MB/s
Read ( Buffer = 256 B, Iteration = 40960 ): 1058.647 MB/s
Read ( Buffer = 128 B, Iteration = 81920 ): 1020.290 MB/s
Read ( Buffer = 64 B, Iteration = 163840 ): 951.351 MB/s
Read ( Buffer = 32 B, Iteration = 327680 ): 838.095 MB/s
Read ( Buffer = 16 B, Iteration = 655360 ): 676.923 MB/s
Read ( Buffer = 8 B, Iteration = 1310720 ): 488.889 MB/s
Read ( Buffer = 4 B, Iteration = 2621440 ): 314.284 MB/s
Does anyone experience same memory read speeds, or is something misconfigured in the default settings of the MontaVista linux w/ kernel 2.6.18?
Any feedback is appreciated!
Regards:
Pámer Bálint
The test program:
-----------------------------------------------
#include <time.h>#include <stdio.h>
#include <string.h>
void MemTest() {
const int min_copy_count = 10;
const int max_buff_size = 1024 * 1024;
int *buffer_dst = ( int * ) malloc( max_buff_size );
int *buffer_src = ( int * ) malloc( max_buff_size );
int buff_size;
int copy_count;
double t, dt;
printf( "\n(1) MEMCPY\n" );
buff_size = max_buff_size;
copy_count = min_copy_count;
while ( buff_size > 2 ) {
int *dst = buffer_dst;
int *src = buffer_src;
t = clock();
for ( int i = 0; i < copy_count; i++ ) {
memcpy( dst, src, buff_size );
}
dt = ( clock() - t ) / ( CLOCKS_PER_SEC / 1000.0 );
printf(
"MemCopy ( Buffer = %7i B, Iteration = %7i ): %.3f MB/s\n",
buff_size, copy_count, ( copy_count * buff_size ) / dt / 1000
);
buff_size /= 2;
copy_count *= 2;
}
printf( "\n(2) READ\n" );
buff_size = max_buff_size;
copy_count = min_copy_count;
while ( buff_size > 2 ) {
int checksum = 0;
t = clock();
for ( int i = 0; i < copy_count; i++ ) {
int *src = buffer_src;
for ( int j = 0; j < buff_size/4; j++ ) {
checksum += *( src++ );
}
}
dt = ( clock() - t ) / ( CLOCKS_PER_SEC / 1000.0 );
printf(
"Read ( Buffer = %7i B, Iteration = %7i ): %.3f MB/s ( checksum = %x )\n",
buff_size, copy_count, ( copy_count * buff_size ) / dt / 1000, checksum
);
buff_size /= 2;
copy_count *= 2;
}
printf( "\n(3) WRITE\n" );
buff_size = max_buff_size;
copy_count = min_copy_count;
while ( buff_size > 2 ) {
int data = 42;
int *dst = buffer_dst;
int *src = buffer_src;
t = clock();
for ( int i = 0; i < copy_count; i++ ) {
memset( dst, data, buff_size );
}
dt = ( clock() - t ) / ( CLOCKS_PER_SEC / 1000.0 );
printf(
"MemSet ( Buffer = %7i B, Iteration = %7i ): %.3f MB/s\n",
buff_size, copy_count, ( copy_count * buff_size ) / dt / 1000
);
buff_size /= 2;
copy_count *= 2;
}
free( buffer_dst );
free( buffer_src );
}
-----------------------------------------------