This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DM6446 PIO memory speed

Hello!

 

We are moving our product from Analog's Blackfin BF533 to TI's DM6446, and have performed some PIO memory benchmarks, and the result were worse than expected for the DaVinci:

 (2) READ
Read ( Buffer = 1048576 B, Iteration = 10 ): 87.381 MB/s
Read ( Buffer = 524288 B, Iteration = 20 ): 87.381 MB/s
Read ( Buffer = 262144 B, Iteration = 40 ): 87.381 MB/s
Read ( Buffer = 131072 B, Iteration = 80 ): 87.381 MB/s
Read ( Buffer = 65536 B, Iteration = 160 ): 87.381 MB/s
Read ( Buffer = 32768 B, Iteration = 320 ): 87.381 MB/s
Read ( Buffer = 16384 B, Iteration = 640 ): 87.381 MB/s
Read ( Buffer = 8192 B, Iteration = 1280 ): 149.797 MB/s
Read ( Buffer = 4096 B, Iteration = 2560 ): 149.797 MB/s
Read ( Buffer = 2048 B, Iteration = 5120 ): 149.797 MB/s
Read ( Buffer = 1024 B, Iteration = 10240 ): 131.072 MB/s
Read ( Buffer = 512 B, Iteration = 20480 ): 149.797 MB/s
Read ( Buffer = 256 B, Iteration = 40960 ): 149.797 MB/s
Read ( Buffer = 128 B, Iteration = 81920 ): 149.797 MB/s
Read ( Buffer = 64 B, Iteration = 163840 ): 131.072 MB/s
Read ( Buffer = 32 B, Iteration = 327680 ): 131.072 MB/s
Read ( Buffer = 16 B, Iteration = 655360 ): 131.072 MB/s
Read ( Buffer = 8 B, Iteration = 1310720 ): 104.858 MB/s
Read ( Buffer = 4 B, Iteration = 2621440 ): 87.381 MB/s

 


And the results for the BF-533:

CPU: 500 Mhz
SDR: 133 MHz
BUS width= 16 bit



 (2) READ
Read ( Buffer = 1048576 B, Iteration = 10 ): 183.916 MB/s
Read ( Buffer = 524288 B, Iteration = 20 ): 184.163 MB/s
Read ( Buffer = 262144 B, Iteration = 40 ): 184.164 MB/s
Read ( Buffer = 131072 B, Iteration = 80 ): 184.162 MB/s
Read ( Buffer = 65536 B, Iteration = 160 ): 184.161 MB/s
Read ( Buffer = 32768 B, Iteration = 320 ): 1063.946 MB/s
Read ( Buffer = 16384 B, Iteration = 640 ): 1098.820 MB/s
Read ( Buffer = 8192 B, Iteration = 1280 ): 1098.525 MB/s
Read ( Buffer = 4096 B, Iteration = 2560 ): 1097.187 MB/s
Read ( Buffer = 2048 B, Iteration = 5120 ): 1094.575 MB/s
Read ( Buffer = 1024 B, Iteration = 10240 ): 1089.335 MB/s
Read ( Buffer = 512 B, Iteration = 20480 ): 1078.901 MB/s
Read ( Buffer = 256 B, Iteration = 40960 ): 1058.647 MB/s
Read ( Buffer = 128 B, Iteration = 81920 ): 1020.290 MB/s
Read ( Buffer = 64 B, Iteration = 163840 ): 951.351 MB/s
Read ( Buffer = 32 B, Iteration = 327680 ): 838.095 MB/s
Read ( Buffer = 16 B, Iteration = 655360 ): 676.923 MB/s
Read ( Buffer = 8 B, Iteration = 1310720 ): 488.889 MB/s
Read ( Buffer = 4 B, Iteration = 2621440 ): 314.284 MB/s

Does anyone experience same memory read speeds, or is something misconfigured in the default settings of the MontaVista linux w/ kernel 2.6.18?

 

Any feedback is appreciated!

 

Regards:

Pámer Bálint

 

 

The test program:

-----------------------------------------------

#include <time.h>#include <stdio.h>
#include <string.h>

void MemTest() {
 
 const int min_copy_count = 10;
 const int max_buff_size = 1024 * 1024;
 int *buffer_dst = ( int * ) malloc( max_buff_size );
 int *buffer_src = ( int * ) malloc( max_buff_size );
 int buff_size;
 int copy_count;
 double t, dt;
 
 printf( "\n(1) MEMCPY\n" );
 buff_size = max_buff_size;
 copy_count = min_copy_count;
 while ( buff_size > 2 ) {
  int *dst = buffer_dst;
  int *src = buffer_src;
  t = clock();
  for ( int i = 0; i < copy_count; i++ ) {
  memcpy( dst, src, buff_size );
  }
  dt = ( clock() - t ) / ( CLOCKS_PER_SEC / 1000.0 );
  printf(
  "MemCopy ( Buffer = %7i B, Iteration = %7i ): %.3f MB/s\n",
  buff_size, copy_count, ( copy_count * buff_size ) / dt / 1000
  );
  buff_size /= 2;
  copy_count *= 2;
 }
 
 printf( "\n(2) READ\n" );
 buff_size = max_buff_size;
 copy_count = min_copy_count;
 while ( buff_size > 2 ) {
  int checksum = 0;
  t = clock();
  for ( int i = 0; i < copy_count; i++ ) {
  int *src = buffer_src;
  for ( int j = 0; j < buff_size/4; j++ ) {
  checksum += *( src++ );
  }
  }
  dt = ( clock() - t ) / ( CLOCKS_PER_SEC / 1000.0 );
  printf(
  "Read ( Buffer = %7i B, Iteration = %7i ): %.3f MB/s ( checksum = %x )\n",
  buff_size, copy_count, ( copy_count * buff_size ) / dt / 1000, checksum
  );
  buff_size /= 2;
  copy_count *= 2;
 }
 
 printf( "\n(3) WRITE\n" );
 buff_size = max_buff_size;
 copy_count = min_copy_count;
 while ( buff_size > 2 ) {
  int data = 42;
  int *dst = buffer_dst;
  int *src = buffer_src;
  t = clock();
  for ( int i = 0; i < copy_count; i++ ) {
  memset( dst, data, buff_size );
  }
  dt = ( clock() - t ) / ( CLOCKS_PER_SEC / 1000.0 );
  printf(
  "MemSet ( Buffer = %7i B, Iteration = %7i ): %.3f MB/s\n",
  buff_size, copy_count, ( copy_count * buff_size ) / dt / 1000
  );
  buff_size /= 2;
  copy_count *= 2;
 }

 free( buffer_dst );
 free( buffer_src );
 
}

-----------------------------------------------

 

  • I am not too familiar with the FB533, is this a DSP part?  What operating system is it running?

    If you are running this test on the ARM side of DM6446 (as opposed to DSP), you need to consider that ARM is running at less than 300 MHz, compared to the BF at 500 MHz.  Also, if you are running Linux on DM6446's ARM, you should consider this is not a real-time operating system; in Linux read calls are blocking and probrably cause the scheduler to schedule some time to do other work (update clock...) until the read comes back; if this happens several times, this can cause many unnecessary context switching that you would not see in a real-time OS.  Just some things to think about.