Bugs in sqrt_16() sqrtv.asm, and non-optimal performance

Brian Willoughby

There is an off-by-one error in the sqrt_16() routine of dsplib for all output values, e.g., sqrt(0) = 0x0001 (0.000030517578125), instead of the expected 0.

In addition, the code is not taking full advantage of the C55x instruction set, and thus its memory, stack, and CPU usage are higher than necessary. The two problems are actually related, although it is not necessary to optimize the routine to fix the bug.

The bug is that the final result must be rounded and then multiplied by 2, but the rounding is applied to the wrong digit because it is applied before the shift/multiply. In other words, instead of actually rounding the value, the code is merely adding 1 to every result, which explains why this routine claims that the square root of 0 is 1.

The fix is to alter the value added to the final result so that it is rounded without excessive offset. Even though sqrt_16() is only an estimation, and it is only carried out to fiveiterations, there's no need to have the additional error of being off by one.

The non-optimal performance is related to two things. First, an intermediate value is temporarily saved in the output vector and accessed once per loop, when there are plenty of registers available that could hold the value. Second, the ROUND opcode and round modes of various other opcodes are never used, and instead a manual rounding is done via creating a temporary stack variable for the offset and inserting additional instructions. It is fairly simple to rewrite this code to use the ROUND function where necessary, and even more importantly to use the R (Round) modifier for instructions that can round a result without incurring an additional processor cycle.

Note that sqrt_16() is interesting because the typically useful FRCT mode of the processor is not used because the iteration involves an explicit divide-by-two on paper than ends up being a free operation when FRCT mode is off. The only catch is that the final result must be multiplied by two (which is where the whole bug comes into play).

I found all of this in the process of writing and testing a sqrt_32() variation which accepts a 32-bit vector and calculates the 16-bit square root of each vector element. Since sqrt_16() was already an off-place operation, the 32-bit to 16-bit operation fits well. There were surprisingly few changes needed to support 32-bit input, and due to the nature of fixed-point division, it makes more sense to start with a 32-bit dividend.

over 14 years ago

Mark M over 14 years ago

TI__Mastermind 30110 points

Nice bug report, Brian.

Thanks for detailing these issues, your fix, and 32-bit enhancement.

I'll submit these to the bug tracker for DSPLIB.

Regards,
Mark

Jiří Babka over 14 years ago

Intellectual 465 points

Hello Brian,

Good work! Could you provide your sqrt_32 routine?

Best regards,

Jiri

Brian Willoughby over 14 years ago in reply to Jiří Babka

Genius 4630 points

Hi Jiri,

I just discovered the DSPLIB Wiki, which is a bit thin at the moment. I plan on putting the fixes for sqrt_16 there first, and then follow up with the sqrt_32 enhancements.

Maarten Venter over 8 years ago in reply to Brian Willoughby

Intellectual 395 points

Hi Brian

I am using the C5510 with a CPU core v2.2. I see the DSPLIB has been modified for the v3 cores. Could be so kind to share your version of sqrt_16 with me. Thanks.

Processors

Processors forum

Bugs in sqrt_16() sqrtv.asm, and non-optimal performance