This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Using rsqr for division on a C674x

Other Parts Discussed in Thread: OMAP-L138

I'm using the 7.4.1 compiler on the c674x of an OMAP-L138

While optimizing a loop, I found that I had an unavoidable single precision floating point division.  Floating point division results in a function call, which disqualifies loop pipelining.   It occurred to me that since I know the denominators in my loop are positive and non-zero that I could just replace b/a with b*(_rsqrsp(a)^2).   And I got a 5 times speedup in my loop execution time as a result  (which makes sense from the compiler's point of view:  I was going from a nonpipelined, sequential loop to a pipelined, parallelized loop).

This replacing of a division with multiplication by an inverse square root, squared seems quite Rube Goldberg'ish to me.  Is this technique well known and I'm just out of the loop (so to speak), or is it not used because it is well known to cause problems that I can't foresee?

Thanks,

   Jay

  • Why not use _rcpsp(a) instead of pow(_rsqrsp(a), 2)?

    Both _rcpsp and _rsqrsp are low-precision approximations.  The result you will get will be only an approximation.  These intrinsics are intended to be used as the starting guess of something like a Newton-Raphson method loop.

    Keep in mind that ^ is integer XOR, not power.

  • Sorry, I was writing in pseudo-math because I was too lazy to write out 

    float a, b, res, r;
    r=rsqrsp(a); res=b*r*r;

    for b/a and also thought the pseudo-math would be clearer.   Probably too much Matlab exposure.  My bad.

    Anyway, my intended usage for this is not particularly high accuracy.   Is there some way I can get some bounds on how BAD the accuracy will be?

  • I went and looked in the manual (SPRU733), and found that rsqrsp is good to 8 bits of mantissa, and for this particular application, that is good enough (although I'll definitely keep the one or two Newton-Raphson iterations idea in my back pocket in case it proves necessary).   I should have gone there first before asking.

    Thanks for the pointer,

        Jay