I have written a Viterbi decoder in assembly (not linear) that is 25% faster than the C64x version that we purchased from a third party. When running the code in DSS (CPU accurate simulator) through Matlab with no noise there are no errors in the output. When running on the target hardware with almost no noise(possibly due to tasks and interrupts) there are bit errors very rarely (every thousandth call to the decoder).
When the old C64x code is run in place of the new decoder it runs without errors. I do not believe this is due to any difference in performance of the decoders.
I have avoided using registers A10-A15 and B10-B15. I have saved and used the return address. I am not modifying any registers like AMR. I have tried disabling interrupts using DINT and RINT (no success).
What else should I be doing and what can I try to find the issue?
Thanks
Chris