This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6678: Performance of Computing Very Large FFT

Part Number: TMS320C6678
Other Parts Discussed in Thread: FFTLIB

Hello!
I use VLFFT demo from e2e.ti.com/.../303599 for calculate 1024K FFT. It's run on Evaluation Board TMDSEVM6678LE.

I use SDK CCS 6.2.0.
I am using the folowing products to compile the VLFFT project are listed below:
XDCTools 3.23.4.60
EDMA3 LLD 2.11.5
IPC 1.24.2.27
MCSDK 2.1.2.6
SYS/BIOS 6.33.4.39

I can't achieve same characteristics which were performed in document "Very large FFT for TMS320C6678 processors". I have next results fo FFT 1024K:
8 Core: 32.484165 ms vs 6.403 ms in document;
4 Core: 35.519242 ms vs 9.605 ms in document;
2 Core: 58.152080 ms vs 19.328 ms in document;
1 Core: 115.402698 ms vs 38.557 ms in document.

Also calculations are dismatch reference results. log is:
max error index: 0
real, 8.515877, real_ref: 36592189439.999998
imag, 8.515877, imag_ref: 36592189439.999998
Fail!!!

Why does it happen?vlfft.zip

VLFFT_log.txt
[C66xx_0] DMA channel 0: 0
DMA channel 1: 0
pass init 1
pass init 2
pass init 3
max num of cores: 8
num of working cores: 8
total size FFT: 1048576
1st iter FFT: 1024
2nd iter FFT: 1024
[C66xx_1] DMA channel 0: 2
[C66xx_2] DMA channel 0: 4
[C66xx_3] DMA channel 0: 6
[C66xx_4] DMA channel 0: 8
[C66xx_5] DMA channel 0: 10
[C66xx_6] DMA channel 0: 12
[C66xx_7] DMA channel 0: 14
[C66xx_1] DMA channel 1: 2
[C66xx_2] DMA channel 1: 4
[C66xx_3] DMA channel 1: 6
[C66xx_4] DMA channel 1: 8
[C66xx_5] DMA channel 1: 10
[C66xx_6] DMA channel 1: 12
[C66xx_7] DMA channel 1: 14
[C66xx_1] The test start!
[C66xx_2] The test start!
[C66xx_4] The test start!
[C66xx_5] The test start!
[C66xx_6] The test start!
[C66xx_3] The test start!
[C66xx_7] The test start!
[C66xx_0] Core0 start initializing data array
Core0 finish initializing data array
Sync up all the cores 
[C66xx_1] vlfft initial sync
[C66xx_2] vlfft initial sync
[C66xx_3] vlfft initial sync
[C66xx_4] vlfft initial sync
[C66xx_5] vlfft initial sync
[C66xx_6] vlfft initial sync
[C66xx_7] vlfft initial sync
[C66xx_0] The test is starting! 
   start of loop: 0 
[C66xx_3] The test is complete!
[C66xx_4] The test is complete!
[C66xx_5] The test is complete!
[C66xx_6] The test is complete!
[C66xx_7] The test is complete!
[C66xx_0] The test is complete
[C66xx_1] The test is complete!
[C66xx_2] The test is complete!
[C66xx_0]   Number of Clocks per FFT  =    32484165 
  Avg timer per fft  =    32.484165 ms 
   max error index:    0
   real, 8.515877, real_ref: 36592189439.999998
   imag, 8.515877, imag_ref: 36592189439.999998
   Fail!!! 






[C66xx_0] DMA channel 0: 0
DMA channel 1: 0
pass init 1
pass init 2
pass init 3
max num of cores: 8
num of working cores: 4
total size FFT: 1048576
1st iter FFT: 1024
2nd iter FFT: 1024
[C66xx_1] DMA channel 0: 2
[C66xx_4] DMA channel 0: 8
[C66xx_6] DMA channel 0: 12
[C66xx_1] DMA channel 1: 2
[C66xx_4] DMA channel 1: 8
[C66xx_6] DMA channel 1: 12
[C66xx_1] The test start!
[C66xx_4] The test start!
[C66xx_6] The test start!
[C66xx_3] DMA channel 0: 6
DMA channel 1: 6
The test start!
[C66xx_5] DMA channel 0: 10
DMA channel 1: 10
The test start!
[C66xx_2] DMA channel 0: 4
[C66xx_7] DMA channel 0: 14
[C66xx_2] DMA channel 1: 4
[C66xx_7] DMA channel 1: 14
[C66xx_2] The test start!
[C66xx_7] The test start!
[C66xx_0] Core0 start initializing data array
Core0 finish initializing data array
Sync up all the cores 
[C66xx_1] vlfft initial sync
[C66xx_2] vlfft initial sync
[C66xx_3] vlfft initial sync
[C66xx_4] vlfft initial sync
[C66xx_5] vlfft initial sync
[C66xx_6] vlfft initial sync
[C66xx_7] vlfft initial sync
[C66xx_0] The test is starting! 
   start of loop: 0 
[C66xx_3] The test is complete!
[C66xx_4] The test is complete!
[C66xx_5] The test is complete!
[C66xx_6] The test is complete!
[C66xx_7] The test is complete!
[C66xx_0] The test is complete
[C66xx_1] The test is complete!
[C66xx_2] The test is complete!
[C66xx_0]   Number of Clocks per FFT  =    35519242 
  Avg timer per fft  =    35.519242 ms 
   max error index:    0
   real, 36860624895.999998, real_ref: 8.578377
   imag, 36860624895.999998, imag_ref: 8.578377
   Fail!!!




[C66xx_7] DMA channel 0: 14
DMA channel 1: 14
[C66xx_5] DMA channel 0: 10
[C66xx_6] DMA channel 0: 12
[C66xx_5] DMA channel 1: 10
[C66xx_6] DMA channel 1: 12
[C66xx_1] DMA channel 0: 2
DMA channel 1: 2
[C66xx_3] DMA channel 0: 6
DMA channel 1: 6
[C66xx_0] DMA channel 0: 0
DMA channel 1: 0
pass init 1
pass init 2
pass init 3
max num of cores: 8
num of working cores: 2
total size FFT: 1048576
1st iter FFT: 1024
2nd iter FFT: 1024
[C66xx_2] DMA channel 0: 4
[C66xx_4] DMA channel 0: 8
[C66xx_2] DMA channel 1: 4
[C66xx_4] DMA channel 1: 8
The test start!
[C66xx_1] The test start!
[C66xx_3] The test start!
[C66xx_5] The test start!
[C66xx_6] The test start!
[C66xx_7] The test start!
[C66xx_2] The test start!
[C66xx_0] Core0 start initializing data array
Core0 finish initializing data array
Sync up all the cores 
[C66xx_1] vlfft initial sync
[C66xx_2] vlfft initial sync
[C66xx_3] vlfft initial sync
[C66xx_4] vlfft initial sync
[C66xx_5] vlfft initial sync
[C66xx_6] vlfft initial sync
[C66xx_7] vlfft initial sync
[C66xx_0] The test is starting! 
   start of loop: 0 
[C66xx_6] The test is complete!
[C66xx_7] The test is complete!
[C66xx_0] The test is complete
[C66xx_1] The test is complete!
[C66xx_2] The test is complete!
[C66xx_3] The test is complete!
[C66xx_4] The test is complete!
[C66xx_5] The test is complete!
[C66xx_0]   Number of Clocks per FFT  =    58152080 
  Avg timer per fft  =    58.152080 ms 




[C66xx_0] DMA channel 0: 0
DMA channel 1: 0
pass init 1
pass init 2
pass init 3
max num of cores: 8
num of working cores: 1
total size FFT: 1048576
1st iter FFT: 1024
2nd iter FFT: 1024
[C66xx_1] DMA channel 0: 2
[C66xx_2] DMA channel 0: 4
[C66xx_7] DMA channel 0: 14
[C66xx_1] DMA channel 1: 2
[C66xx_2] DMA channel 1: 4
[C66xx_7] DMA channel 1: 14
[C66xx_1] The test start!
[C66xx_2] The test start!
[C66xx_7] The test start!
[C66xx_5] DMA channel 0: 10
[C66xx_6] DMA channel 0: 12
[C66xx_5] DMA channel 1: 10
[C66xx_6] DMA channel 1: 12
[C66xx_5] The test start!
[C66xx_6] The test start!
[C66xx_4] DMA channel 0: 8
DMA channel 1: 8
The test start!
[C66xx_3] DMA channel 0: 6
DMA channel 1: 6
The test start!
[C66xx_0] Core0 start initializing data array
Core0 finish initializing data array
Sync up all the cores 
[C66xx_1] vlfft initial sync
[C66xx_2] vlfft initial sync
[C66xx_3] vlfft initial sync
[C66xx_4] vlfft initial sync
[C66xx_5] vlfft initial sync
[C66xx_6] vlfft initial sync
[C66xx_7] vlfft initial sync
[C66xx_0] The test is starting! 
   start of loop: 0 
[C66xx_5] The test is complete!
[C66xx_6] The test is complete!
[C66xx_7] The test is complete!
[C66xx_0] The test is complete
[C66xx_1] The test is complete!
[C66xx_2] The test is complete!
[C66xx_3] The test is complete!
[C66xx_4] The test is complete!
[C66xx_0]   Number of Clocks per FFT  =    115402698 
  Avg timer per fft  =    115.402698 ms 
   max error index:    0
   real, 8.519783, real_ref: 36592189439.999998
   imag, 8.519783, imag_ref: 36592189439.999998
   Fail!!!