I'm working on a C5535 application where I'm using the FFT accelerator, and I realized that the hwafft_br() function can be sped up.
Old function (0xfefe9c in ROM), 2 clocks to move an IQ pair:
bclr ARMS
sub #1, T0, T1
mov T1, BRC0
rptblocal..
mov dbl(*AR0+), AC3
mov AC3, dbl(*(AR1+T0B))
bset ARMS
ret
Faster function, takes 1 clock to move an IQ pair. Takes the same input arguments.
sub #1, T0, T1
mov T1, CSR
amov XAR0, XCDP
bclr ARMS
rpt CSR
mov *CDP+, dbl(*(AR1+T0B))
bset ARMS
ret
Posting this in case anyone finds it handy.