TMS320C6748: optimize

user5075695

Part Number: TMS320C6748

Hi,

My program has the following code, these code on the C6748 (300M clock) almost spent 10s, the main time spent on the calculation gama[], how should I optimize it, I hope you give suggestions!

for(theta=0;theta<1280;++theta)
{
for(j=theta;j<theta+cp;++j)
{
gama[theta]=gama[theta]+cabsf(data[j]*conjf(data[j+1280]))-(1/2.0)*(pow(cabsf(data[j]),2)+pow(cabsf(data[j+1280]),2));
}
}

over 6 years ago

0 Cvetolin Shulev-XID over 6 years ago

TI__Guru 65405 points

The team is notified. They will post their feedback directly here.

BR
Tsvetolin Shulev

0 Norman Wong over 6 years ago

Guru 26430 points

I don't think this question is DSP related. The TI guys may disagree. It is more of a general coding thing. From what I can see, the inner loop appears to recalculate the same values more than once. The larger cp is, the worse the repeated calc. Suggest precalculating them.

float lookup[1280+cp];


for(j=0;j<theta+cp;j++)
  lookup[j] = cabsf(data[j]*conjf(data[j+1280]))
            - (1/2.0)*(pow(cabsf(data[j]),2)+
                      pow(cabsf(data[j+1280]),2));

for(theta=0;theta<1280;++theta)
{
  for(j=theta;j<theta+cp;++j)
  {
    gama[theta] = gama[theta] + lookup[j];
  }
}

A sliding window would reduce the amount of memory used for the lookup. A bit more complicate to code.

0 Victor Kazmirenko over 6 years ago in reply to Norman Wong

Guru 13042 points

I would further avoid calculation of square using pow() function.

0 user5075695 over 6 years ago in reply to Victor Kazmirenko

Intellectual 350 points

What is the best way to calculate squareness?

0 Victor Kazmirenko over 6 years ago in reply to user5075695

Guru 13042 points

I would write y = x *x instead of y = pow(x,2). Even if I have to make temporary variable. Having function call disables pipelining of your loops, which is crucial for performance.

0 user5075695 over 6 years ago in reply to Norman Wong

Intellectual 350 points

How should the sliding window be programmed?Can you provide some ideas to guide me?

0 Norman Wong over 6 years ago in reply to user5075695

Guru 26430 points

What I call a "sliding window" is to pre-calculate as needed. Something like this.

#define SQR(_x) (_x*_x)

static inline float calc(complex data[], int j)
{
  return cabsf(data[j]*conjf(data[j+1280])) -
         (1/2.0)*( SQR(cabsf(data[j])) +
                   SQR(cabsf(data[j+1280])) );
}

void do_calc(void)
{
  int theta;
  int j;
  int k;
  float lookup[cp];

  // Calculate the first cp where theta = 0
  for(k=0; k<cp; k++)
    lookup[k] = calc(data, k);

  theta = 0;
  for(;;)
  {
    // Do one theta across calc array.
    for(j=theta,k=0; j<theta+cp; ++j,++k)
      gama[theta] = gama[theta] + lookup[k];

    // Note j is now theta+cp

    theta++;
    if(theta >= 1280) break;

    // Setup for next iteration

    // Move calc entries down to make room for new one
    for(k=0; k<(cp-1); k++)
      lookup[k] = lookup[k+1];

    // Note k is now (cp-1)
    // Calc next value at j or (last theta+cp)
    lookup[k] = calc(data, j);
  }
}

The inline function should not result in an actual function call that would break pipelining. You could replace the inline as a macro. If cabsf() or conjf() are functions, then pipelining will be disrupted.

There is additional processing time required to shuffle entries the lookup array down. It will be slower than pre-calculating all values.

Processors

Processors forum

TMS320C6748: optimize