Problem in Speeding Up Floating Point Computations on DM3730

Mohit Hada

Other Parts Discussed in Thread: DM3730

Hi,

I have a code where i require around (10E+6)*(10E+6) number of
floating point multiplication and that much number of floating point
additions as I am doing auto correlation for my development for 10E+6
samples in circular shifting manner

When I try to run my code in DM3730 on ARM Core (Since I don't
know how to use DSP Core), I get around 50E+6 multiplications and
additions in greater than 5 minutes which is very slow for my entire
requirement. My BB uses Angstrom and I am not able to use hard FPU.

Can Some body suggest me How to speed up Computation speed on BB so
that I can make a feasible system.

My snippet of the code is as follows: Please help me I am stuck....

The maximum value of k =1000000.

for(i=1;i<=k;i++)
{
sum=0;
for (j=1;j<=k;j++)
{
sum = sum + (*(prdc_pulse_out_store + j))*(*(prdc_pulse_out
+ j));
}

*(Rxy + i) = (sum/k);

// circular shifting
temp = *(prdc_pulse_out + k);
for(j=k;j>=2;j--)
{
*(prdc_pulse_out + j) = *(prdc_pulse_out + j -1);
}
*(prdc_pulse_out + 1) = temp;

}

over 14 years ago

0 Norman Wong over 14 years ago

Guru 26430 points

I don't think there is much you can do about the floating point operations. I am assuming you are using double througout. It you are using float, you might be able to carefully cast or ordering your operations to avoid promotion to double. Depends on the compiler.

You could avoid shuffling your circular buffer by using a base index. Something like:

j0 = 1;
for(i=1;i<=k;i++)
{
sum=0;
for (j=1;j<=k;j++)
{
    jout = j0+j-1;
    if(jout > k)
      jout = jout - k;
    sum += prdc_pulse_out_store[j] * prdc_pulse_out[jout];
}

Rxy[i] = (sum/k);

j0--;
if(j0 <= 0)
    j0 = k;
}

Placing variables in registers might help but I think the FP ops will take the majority of the time.

I noticed that you are indexing from 1. Was this ported from FORTRAN code? Your arrays should allocate one more to avoid memory problems.

Maybe the better question; Are there other auto-correlation algorithms?

0 Mohit Hada over 14 years ago in reply to Norman Wong

Intellectual 745 points

Hi Norman,

Actually I am a VHDL programmer for FPGAs so I do not have this C language hands on....

so thanks for giving clue for removing this circular buffer... this will help...

I have replicated the matlab code, so thats why the 1 remains as it is....

Yeah I ahve to look for some other auto correlation algos.

regards

mohit

0 Norman Wong over 14 years ago in reply to Mohit Hada

Guru 26430 points

If you know that dynamic range and precision of your data, you could go with fixed precision integer math. Just a matter of keeping track of the decimal point. With integer math, restricting k to 2^n allows the division to be turned into a right shift.

0 Hongtao Yan over 14 years ago in reply to Norman Wong

Intellectual 395 points

Hi Norman:

When you say there is no too much we can deal with floating point, do you base on the assumption that customer has utilize VFP in the Cortex A8? If not, could you give us some suggestion how to use VFP inside ARM core, and also Neon could deal with floating point calculation as well. How could we utilize that in Android system? Currently our software team complaining the math calculation is 50% slower on DM3730 compared to Qualcomm QSD8250. We are using Benchmark-Release.apk from google to verify the performance. Probably you could shine some light why we are see the big difference on the math performance.

0 Norman Wong over 14 years ago in reply to Hongtao Yan

Guru 26430 points

Sorry, I have little experience with HW FPUs, VFP or the DM3730. I let the compiler take care of that by selecting a HW FPU if available. I did not see any way to reduce the FP (HW or SW) operations in the algorithm presented by Mohit Hada.

Processors

Processors forum

Problem in Speeding Up Floating Point Computations on DM3730