This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C66x compiler optimization



Hi Everyone,

I am a student. I recently want to port some host-side workload to DSP. But I met some question regarding to the compiler, which makes me very confused.

My environment can be described as follows:

1. I am using TI C6678 EVM board. 

2. I am running task on SYS/BIOS.

3. I only use single core and set up a task on Core 0.

The program on the task is quite simple as follows:

for (i = 0; i < NI; i++)
{

for (j = 0; j < NJ; j++)
{
    C[i*NJ + j] *= BETA;

    for (k = 0; k < NK; ++k)
    {
        C[i*NJ + j] += ALPHA * A[i*NK + k] * B[k*NJ + j];

     }
}
}

NI,NJ, and NK are fixed values, which are 4096.

When I try to use compiler to optimize my code, I found there are several issues as follows.

1. Since I make sure A[], B[]. C[] does not have bad alias,  I add restrict to A[], B[], C[]. The software pipeline shows the ideal execution time can be reduce to 1/3. But I feel it may not be real, so I compared the output results with/without restrict key word. The results actually have some difference, but not totally different. 

2. I try to print out some execution time during the execution. The printing instruction is as following:

// // Get Stop Time
// g_ui64StopTime = (uint64_t)(TSCL) ;
// g_ui64StopTime |= (uint64_t)((uint64_t)TSCH << 32 ) ;
// g_ui64ElapsedTime = g_ui64StopTime - g_ui64StartTime;
//
// // Get Start Time
// g_ui64StartTime = (uint64_t)(TSCL) ;
// g_ui64StartTime |= (uint64_t)((uint64_t)TSCH << 32 ) ;

When I insert it in k loop, the compiler produces different results. I also check software pipeline information, which shows the execution time can be reduced by 3X. But when I executes the program, the actual execution is 50% longer.

Another interesting thing is, when I insert timing instructions in i loop, software pipeline information has not been changed. But the execution time has reduced from 140min to 25min, compared to the case that I didn't insert timing instructions (I calculate the time by watch).

The above observation makes me really confused. I really appreciate it if you can share your opinion with me. 

Best Regards,

Jie