C6457(1GHz) FFT function timing

Guangyi Wang

We used 1K FFT implementations from C64+ library in our project. The target DSP is 1GHz C6457, and the FFT function used is DSP_fft16x16(). From document SPRUEB8B, we found the # of cycles for SA is 4218, which corresponds to 4.218 us for 1G DSP. However, the processing timing measured is about 86 us. Obviously, our usage of the assambly library is not correct. Could you please help us to get the right FFT assembly implementation/usage for our project?

For reference, the natural C timing is measured as 38us (optimization 3 with full suppresion of debugging) compared to 26268 cycles in TI document, and intrisic C timing is 9.2 us compared to 5369 cycles

Thanks

Guangyi Wang

over 14 years ago

0 Rahul Prabhu over 14 years ago

TI__Guru** 116170 points

Guangyi,

The performance benchmarks published in the TI document are using a C64x+ cycle accurate simultator that assumes all code is in internal memory so the cod is unlikely to provide you with the same numbers on the actual device due to addtional time required to access memory and caching.

Coming to the performance of the assembly function as compared to the natural C and the intrinsic C version, it appears that you may have implemented something differently for the test setup of the assembly source function as compared to the test bench for the other versions. We provide a test bench for all the function in the library in the src folder in the library that can be used as reference implementation. Can you run the test bench for the fft16x16 function and report the numbers you see.

Regards,

Rahul

0 Guangyi Wang over 14 years ago in reply to Rahul Prabhu

Prodigy 80 points

Hi Rahul

We use internal RAM for code space. So, the cycle count should match the benchmark.

The most important issue we need to solve is the processing timing for assembly FFT implementation. It's even slower than that of the natural C implementation.

Here is the 3 FFT function calls used in out DSP project for comparison:

// Natural C

DSP_fft16x16_cn(&gTwiddleFft[0], FFT_SIZE_IMP, fftInPtr, (Int16*) &gFftOutUl[0]);

// Intrisic C

DSP_fft16x16_i(&gTwiddleFft[0], FFT_SIZE_IMP, fftInPtr, (Int16*) &gFftOutUl[0]);

// Accembly

DSP_fft16x16(&gTwiddleFft[0], FFT_SIZE_IMP, fftInPtr, (Int16*) &gFftOutUl[0]);

Could you please help us to find out why the assembly implementation is so slow?

Thanks

Guangyi

0 Rahul Prabhu over 14 years ago in reply to Guangyi Wang

TI__Guru** 116170 points

Guangyi,

Assuming you have same memory steup for all tests, the benchmarks that you are descibing seem unlikely unless you have a timer/counter that is overflowing and wraps around or if the C code for some reason exits without computing the entire FFT . Have you compared the outputs to see if they are the same. Is it possible for you to send your test project so that we can replicate the scenerio or may be review the code for you?

Regards,

Rahul

0 Guangyi Wang over 14 years ago in reply to Rahul Prabhu

Prodigy 80 points

Hi Rahul

Please send me your contact info

Thanks

Guangyi

0 Guangyi Wang over 14 years ago in reply to Rahul Prabhu

Prodigy 80 points

Hi Rahul

We have compared the outputs for natural C and assembly implementations. The outputs are the same. But, the problem is that the assembly implementation consumes much more cycles than expected, even more than the cycle count of the natural C implementation.

Is there anything extra we need to do when build the assembly routine from TI library, such as build properties setup? We are using CCSv4.1.2.

Thanks

Guangyi

0 Rahul Prabhu over 14 years ago in reply to Guangyi Wang

TI__Guru** 116170 points

Guangyi,

Can you send your code to the Developer Mailing List mentioned here so that we can take a look at the issue.

http://processors.wiki.ti.com/index.php/Software_libraries#Developer_Mailing_List

Regards,

Rahul

0 Eyal Amir over 13 years ago in reply to Rahul Prabhu

Prodigy 10 points

Did you ever figure out the problem? We're having similar issues.

Processors

Processors forum

C6457(1GHz) FFT function timing