I am using the atansp_i function from c67 fastMath 2.01.00.0 on the DSP core of an L138 Logic PD EVM running at 300MHz with an XDS510 ICE, CCS4.2.4. Build is set to release mode and I have tried all of the speed optimisation settings I can find. The code is running from internal RAM.
According to the bechmark table in the c67xfastRTS user guide this should execute in 19 clock cycles, however it is taking aprox 0.3 us (timed using one of the h/w timers) or 90 clock cycles. The standard RTS atanf takes 8 us. I cannot find any benchmark source code to verify the execution speed. I have also tried to use the profile clock in CCS4 but this seems to give 20000000 counts for a single instruction for some reason. All other code also seems to be running slower than I would expect. There is a Linux benchmark test supplied with the EVM which shows an 8k sample MATH_atansp running in 19us but I do not have the source for this and I am not sure what they mean by 8k, seems unlikely that it is doing 8k function calls.
Has anyone else seen slower than expected code execution?
Does anyone have any benchmark code with known results which I can run to check execution speed?
Thanks.