This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Is C6accel really fast?

I try to use MATH (float) functions from c6accel_1_01_00_07 lib in my board OMAP-L138 Experimenter Kit

The problem is that they work very very very slowly. In comparison with with standard +, -, *, /.

Let me to show you a simple example. I have this code

float Two = 2.0; 
float a,b,c;
clock_gettime(clockID, &startTime);
for (i=0; i<1000; i++) {
   a = i;
   C6accel_MATH_powsp(hC6accel,&a,&Two, &b, 1);
   C6accel_MATH_divsp_i(hC6accel,&a, &b, &c, 1);
   C6accel_MATH_mpysp_i(hC6accel,&a, &c,  &b, 1); 
   C6accel_MATH_subsp_i(hC6accel,&b, &a,  &c, 1); 
   C6accel_MATH_addsp_i(hC6accel,&a, &c,  &b, 1); 
}
clock_gettime(clockID, &endTime); 
printf("\nTime of execution (c6accel math): %d sec.\n", endTime.tv_sec - startTime.tv_sec);

//This code has similar functionality as an above code has  
clock_gettime(clockID, &startTime);  
for (i=0; i<1000; i++) {
   a = i;
   b=pow(a,Two);
   c=a/b; 
   b=a*c; 
   c=b-a;  
   b=a+c; 	
}
clock_gettime(clockID, &endTime); 
printf("\nTime of execution (standard): %d sec.\n", endTime.tv_sec - startTime.tv_sec);

Results very upset me. Here they are: 

Time of execution (c6accel math): 7 sec.

Time of execution (standard): 0 sec.

Educate me please, if I should to do some magic manipulations before using c6accel functions? 

P.S. compilation's options: -march=armv5t -mtune=arm9tdmi -mabi=aapcs-linux -O -DPLATFORM=138 

Thank you.
  • May be it calls of DSP are time-consuming? May be I need to use shared memory with DSP (Memory_CONTIGHEAP, Memory_CACHED)? 

  • Myosotis,

    Offloading a task from ARM to the DSP is useful only when large chunk of data need to be processed. In your case you are offfloading just 1 word of data to the DSP, in this case the  interprocessor offsets the performance benefits of offloading the task to the DSP. This has been explained in detail with the help of graphs in the following section in the C6Accel documentation. Refer to that article to make appropriate evaluation of  tasks that are suitable to be offloaded to the DSP.

    http://processors.wiki.ti.com/index.php/C6EZAccel_FAQ#What_is_the_inter-processor_overhead_involved_in_C6Accel_.3F

    Let us know if you have any further questions.

    Regards,

    Rahul

    PS: You observation regarding use of shared memory is correct, all buffers passed from ARM to DSP need to come from CMEM and should be configured a contiguous and cached.