This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

execution speed on Embedded Linux and Without OS

Other Parts Discussed in Thread: AM1808, STARTERWARE-SITARA

To whom it may concern,

We use TI product,Zoom AM1808 eXperimenter Kit-Logic PD,

Have some problem,I hope to get some advice ,

We write a simple program (just try calling 3x3 mean filter 500 times for a 192x192 gray level image) to test execution time in following two environment embedded Linux and without OS,

The result let we confused, almost 9 times slower than on the embedded linux.

embedded Linux OS (arm2009-q1 compiler in embedded Linux) spend 6.57 sec,

Without OS                    (use CCS 4.2 compiler)                                   spend 57.16 sec,

Is there any further setting is required in CCS?

Could you give me some advice,thank you.


///////Source Code////////////////

void main_filter(BYTE *imgIn,BYTE *imgOut,int w,int h){
int i,j,offset_j,offset,tmp;
offset_j=0;
for(j=0;j<h-1;j++){
         for(i=0;i<w-1;i++){
                    offset=offset_j+i;
                   tmp=imgIn[offset]
                          +imgIn[offset-w]
                          +imgIn[offset-w-1]
                          +imgIn[offset-w+1]
                          +imgIn[offset+w]
                          +imgIn[offset+w-1]
                          +imgIn[offset+w+1]
                          +imgIn[offset-1]
                          +imgIn[offset+1];
                          tmp/=9;
                          imgOut[offset]=tmp;

          }
offset_j+=w;
}
}


int main(void) {

uint32_t results = 0;

uint32_t rtn;

int i;

EVMAM1808_init();

EVMAM1808_initRAM();

USTIMER_init();

I2C_init(I2C0, I2C_CLK_400K);

rtn = UART_init(DEBUG_PORT, 115200);

UART_txString(DEBUG_PORT, "\r\n\r\n********** Start **********\r\n\r\n");

for(i=0;i<500;i++){

main_filter(image,image2,192,192);

}

UART_txString(DEBUG_PORT, "\r\n\r\n********** Sucess **********\r\n\r\n"); }

BR

  • Hi,

    The difference between the two running times is indeed very significant; however, I am not exactly sure how you are measuring these times to better pinpoint the exact source of delay. Below I consider three scenarios with precision in descending order.

    - (less precise measurement) If you are measuring the time passed when hitting the button "run" from inside CCS and from running the Linux application, you will have to consider the initialization routines for the EVMAM1808_init(), EVMAM1808_initRAM(), etc. are not being executed by the linux application.

    - If you are measuring the time passed between the UART "start" and "success" messages, then you get a closer idea on the execution time of the routine itself. However, keep in mind the result may still be skewed by the different execution times of the UART routines both in the standalone and Linux versions (I obviously agree that ~50s of difference is still a lot). In this case, make sure the hardware is configured in a similar way - i.e., cache is activated, the device's PLL is running at the same speed, optimization level is the same, etc.

    - (most precise) if you are counting the number of cycles taken for your code to complete the 500 iteration for() loop from inside CCS (this can be done for both the standalone or the linux executables), then you get the exact effect of how the compiler is optimizing the routine. Also, since the cycle count is independent on the actual clock speed, you have no influence from the PLL configuration. If you see a difference, I would suspect the compiler's optimizer is either turned off or not properly optimizing your code inside the main_filter() routine, which may require some optimization techniques.

    All in all, the useful references below will help you in this investigation:

    http://processors.wiki.ti.com/index.php/ARM_compiler_optimizations

    http://processors.wiki.ti.com/index.php/Linux_Debug_in_CCSv5

    http://processors.wiki.ti.com/index.php/Profile_clock_in_CCS

    Hope this helps,

    Rafael

     

     

  • Hello Wei-ching Lin,

    Since your Non-OS is so slow, I wonder if you are enabling cache and MMU.   You can find examples of setting up cache and MMU in Staterware.

    http://www.ti.com/tool/starterware-sitara