Guangyi,
The performance benchmarks published in the TI document are using a C64x+ cycle accurate simultator that assumes all code is in internal memory so the cod is unlikely to provide you with the same numbers on the actual device due to addtional time required to access memory and caching.
Coming to the performance of the assembly function as compared to the natural C and the intrinsic C version, it appears that you may have implemented something differently for the test setup of the assembly source function as compared to the test bench for the other versions. We provide a test bench for all the function in the library in the src folder in the library that can be used as reference implementation. Can you run the test bench for the fft16x16 function and report the numbers you see.
Regards,
Rahul
---------------------------------------------------------------------------------Please click the Verify Answer button on this post if it answers your question.---------------------------------------------------------------------------------
Hi Rahul
// Natural C
DSP_fft16x16_cn(&gTwiddleFft[0], FFT_SIZE_IMP, fftInPtr, (Int16*) &gFftOutUl[0]);
// Intrisic C
DSP_fft16x16_i(&gTwiddleFft[0], FFT_SIZE_IMP, fftInPtr, (Int16*) &gFftOutUl[0]);
// Accembly
DSP_fft16x16(&gTwiddleFft[0], FFT_SIZE_IMP, fftInPtr, (Int16*) &gFftOutUl[0]);
Assuming you have same memory steup for all tests, the benchmarks that you are descibing seem unlikely unless you have a timer/counter that is overflowing and wraps around or if the C code for some reason exits without computing the entire FFT . Have you compared the outputs to see if they are the same. Is it possible for you to send your test project so that we can replicate the scenerio or may be review the code for you?
Please send me your contact info
Thanks
Guangyi
We have compared the outputs for natural C and assembly implementations. The outputs are the same. But, the problem is that the assembly implementation consumes much more cycles than expected, even more than the cycle count of the natural C implementation.
Is there anything extra we need to do when build the assembly routine from TI library, such as build properties setup? We are using CCSv4.1.2.
Can you send your code to the Developer Mailing List mentioned here so that we can take a look at the issue.
http://processors.wiki.ti.com/index.php/Software_libraries#Developer_Mailing_List
Did you ever figure out the problem? We're having similar issues.