Part Number: TMS320C6748
To put this in context, back in 2013 I bought a TI C6748 LCDK and a XDS100v2 JTAG emulator with the view to making a hi-res (8k) FIR filter for audio purposes. The algorithm I was using is based off an existing one I implemented and successfully used in my audio mastering software Har-Bal that runs efficiently on a PC (even one circa 2005) so it is numerically efficient. From a point of naivety I would have thought this LCDK should have more than enough grunt to do what I needed in implementing a stereo 8k FIR filter in hardware but when I started my development journey I quickly found that it was going a lot slower than I was expecting. After a lot of stuffing around and not much success I put it aside as I had more important things to do but now I am returning to it and am hitting the same issue and have spent days trying to understand what is wrong with my system.
To the problem, I have verified the clock speed of the LCDK to be 300MHz based on the advice given here,
https://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/115/t/54812
so it appears the clock speed is not an issue. Internally my code uses the DSP lib functions,
DSPF_sp_cfftr2_dit()
DSPF_sp_icfftr2_dif()
using a base 2 FFT size of 256 and my algorithm produces the correct results but painfully slowly, so much so that the processing is not possible in anything near real time so I decided to use the TSCL, TSCH approach above to measure the cycle count for calls to DSPF_sp_cfftr2_dit() and compare it to what your manual states it should approximately be, ie.
TSCL = 0;
TSCH = 0;
t_start = TSCL;
t_start += (unsigned long long)TSCH << 32;
/* Transform to frequency domain */
DSPF_sp_cfftr2_dit(Hm[cn], w, BLOCK_LENGTH);
t_stop = TSCL;
t_stop += (unsigned long long)TSCH << 32;
t_overhead = t_stop - t_start;
BLOCK_LENGTH is 256 in this case and executing up to a breakpoint on t_overhead=... and looking at the variables it tells me that it has taken an astonishing 222688 cycles to calculate a 256 point FFT whereas the manual suggests it should be in the order of 4138. That is nearly 54 times slower than it should be and I haven't the faintest idea why. Can you possibly explain why this might be and what I could be possibly doing wrong? I am currently using the Code Composer Studio 5.5.0.00077 that I installed back in 2013 when I first got the LCDK. Also note that there are no calls to any system functions controlling the LCDK processor or peripherals prior to this initialisation code if that helps at all.
thanks in advance,
Paavo Jumppanen.