I'm running into the exact same issue that Daniel Chen encountered here:
http://e2e.ti.com/support/dsp/c5000/f/109/t/151550
I'm putting a single cycle with a vector magnitude of 1 into a queue:
queue[0] = 32767;
queue[1] = 0;
queue[2] = 23170;
queue[3] = 23170;
queue[4] = 0;
queue[5] = 32767;
queue[6] = -23170;
queue[7] = 23170;
queue[8] = -32767;
queue[9] = 0;
queue[10] = -23170;
queue[11] = -23170;
queue[12] = 0;
queue[13] = -32767;
queue[14] = 23170;
queue[15] = -23170;
Then bit reversing it into a "fft_a" buffer (which produces valid data in fft_a)
hwafft_br(queue,fft_a,8);
Then calling hwafft_8pts:
hwafft_8pts(fft_a, fft_b, FFT_FLAG, SCALE_FLAG);
If I set a breakpoint after the hwafft_8pts call, both fft_a and fft_b buffers are zero. But if I single-step through the hwafft_8pts() function with the emulator, it works fine.
I'm using two different implementations of hwafft_8pts - first a wrapper function that sets the high 8 bits of the XAR5:0 registers to zero to avoid the known errata in the ROM functions, and secondly running the SPRABB6 "fixed" code out of RAM. Both do the exact same thing - zero result when free-running, perfect result when single-stepping.
Free-running to certain points in the FFT routine can cause different results - eg, free running to the XAR0 = XAR4 "start 1st double stage" and single stepping from there gives a nonzero, but corrupted, output.
Help?