This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6748: optimize

Part Number: TMS320C6748

Hi,

    My program has the following code, these  code on the C6748 (300M clock) almost spent 10s, the main time spent on the calculation gama[], how should I optimize it, I hope you give suggestions!

 

for(theta=0;theta<1280;++theta)
{
       for(j=theta;j<theta+cp;++j)
          {
              gama[theta]=gama[theta]+cabsf(data[j]*conjf(data[j+1280]))-(1/2.0)*(pow(cabsf(data[j]),2)+pow(cabsf(data[j+1280]),2));
          }
}

  • The team is notified. They will post their feedback directly here.

    BR
    Tsvetolin Shulev
  • I don't think this question is DSP related. The TI guys may disagree. It is more of a general coding thing. From what I can see, the inner loop appears to recalculate the same values more than once. The larger cp is, the worse the repeated calc. Suggest precalculating them.

    float lookup[1280+cp];
    
    
    for(j=0;j<theta+cp;j++)
      lookup[j] = cabsf(data[j]*conjf(data[j+1280]))
                - (1/2.0)*(pow(cabsf(data[j]),2)+
                          pow(cabsf(data[j+1280]),2));
    
    for(theta=0;theta<1280;++theta)
    {
      for(j=theta;j<theta+cp;++j)
      {
        gama[theta] = gama[theta] + lookup[j];
      }
    }
    

    A sliding window would reduce the amount of memory used for the lookup. A bit more complicate to code.

  • I would further avoid calculation of square using pow() function.

  • What is the best way to calculate squareness?
  • I would write y = x *x instead of y = pow(x,2). Even if I have to make temporary variable. Having function call disables pipelining of your loops, which is crucial for performance.
  • How should the sliding window be programmed?Can you provide some ideas to guide me?
  • What I call a "sliding window" is to pre-calculate as needed. Something like this.

    #define SQR(_x) (_x*_x)
    
    static inline float calc(complex data[], int j)
    {
      return cabsf(data[j]*conjf(data[j+1280])) -
             (1/2.0)*( SQR(cabsf(data[j])) +
                       SQR(cabsf(data[j+1280])) );
    }
    
    void do_calc(void)
    {
      int theta;
      int j;
      int k;
      float lookup[cp];
    
      // Calculate the first cp where theta = 0
      for(k=0; k<cp; k++)
        lookup[k] = calc(data, k);
    
      theta = 0;
      for(;;)
      {
        // Do one theta across calc array.
        for(j=theta,k=0; j<theta+cp; ++j,++k)
          gama[theta] = gama[theta] + lookup[k];
    
        // Note j is now theta+cp
    
        theta++;
        if(theta >= 1280) break;
    
        // Setup for next iteration
    
        // Move calc entries down to make room for new one
        for(k=0; k<(cp-1); k++)
          lookup[k] = lookup[k+1];
    
        // Note k is now (cp-1)
        // Calc next value at j or (last theta+cp)
        lookup[k] = calc(data, j);
      }
    }
    

    The inline function should not result in an actual function call that would break pipelining. You could replace the inline as a macro. If cabsf() or conjf() are functions, then pipelining will be disrupted.

    There is additional processing time required to shuffle entries the lookup array down. It will be slower than pre-calculating all values.