I have an algorithm that wants to calculate multiplicative inverse of a floating-point number. However speed is important, accuracy less so, so I am using the reciprocal approximation intrinsic _rcpsp.
This is a cross-platform project, so I would like to simulate the output of _rcpsp on the non-TI platforms. Per the TMS320C66x DSP CPU and Instruction Set Reference Guide, 1/x should have a mantissa error of less than 2^-8 from RCPSP, but I am looking for a better simulation than that. For now I am primarily interested in a domain of normal single-precision floating-point numbers; I'll deal with special cases (e.g. FLT_MIN, FLT_MAX, INFINITY, NAN, etc.) in the future. I am not concerned with replicating timing.
Does anyone have a solution that is a good approximation of the RCPSP instruction?
There is a similar thread here that was not answered.
C6655, cgt 8.0.4, CCS 6.1.1