I'm using the 7.4.1 compiler on the c674x of an OMAP-L138
While optimizing a loop, I found that I had an unavoidable single precision floating point division. Floating point division results in a function call, which disqualifies loop pipelining. It occurred to me that since I know the denominators in my loop are positive and non-zero that I could just replace b/a with b*(_rsqrsp(a)^2). And I got a 5 times speedup in my loop execution time as a result (which makes sense from the compiler's point of view: I was going from a nonpipelined, sequential loop to a pipelined, parallelized loop).
This replacing of a division with multiplication by an inverse square root, squared seems quite Rube Goldberg'ish to me. Is this technique well known and I'm just out of the loop (so to speak), or is it not used because it is well known to cause problems that I can't foresee?
Thanks,
Jay