Other Parts Discussed in Thread: FFTLIB, SYSCONFIG
Tool/software:
Hello,
I have built the 9.02 version of FFTLIB to test performance of our J722SXH01EVM reference board. I am running 16 1024-point FFTs, trying to match the documented performance of 31650 cycles for a 1024x16 FFT. I am using the FFTLIB_fft1dBatched_i32fc_c32fc_o32cf function to test this, but the performance I am getting is 568,441 cycles to run the batch. This is about 16x slower than I'm expecting, which makes me believe perhaps batching, or vectorization, or something like that is not working, and hence we are seeing 16x reduced performance. But the documentation is minimal and the code is very hard to read, so I'm finding it difficult to debug this issue.
The code snippet I am running is here:
curTime = ClockP_getTimeUsec(); /* get time as measured by timer associated with ClockP module */ #define NUM_POINTS 1024 #define NUM_CHANNELS 16 uint8_t pBlock[FFTLIB_FFT1DBATCHED_I32FC_C32FC_O32FC_PBLOCK_SIZE]; FFTLIB_F32* pX = (FFTLIB_F32 *) memalign (128, NUM_POINTS*NUM_CHANNELS*2 * sizeof (FFTLIB_F32)); FFTLIB_F32* pW = (FFTLIB_F32 *) memalign (128, NUM_POINTS*2 * sizeof (FFTLIB_F32)); FFTLIB_F32* pY = (FFTLIB_F32 *) memalign (128, NUM_POINTS*NUM_CHANNELS*2 * sizeof (FFTLIB_F32)); FFTLIB_bufParams1D_t bufParamsData; FFTLIB_bufParams1D_t bufParamsTw; uint32_t numPoints = NUM_POINTS; uint32_t numChannels = NUM_CHANNELS; bufParamsData.data_type = FFTLIB_FLOAT32; bufParamsData.dim_x = NUM_POINTS*NUM_CHANNELS*2; bufParamsTw.data_type = FFTLIB_FLOAT32; bufParamsTw.dim_x = NUM_POINTS*2; tw_gen_f32(pW, numPoints); // Generate twiddle factors FFTLIB_STATUS status = FFTLIB_fft1dBatched_i32fc_c32fc_o32fc_checkParams(pX, &bufParamsData, pW, &bufParamsTw, pY, &bufParamsData, numPoints, numChannels, pBlock); uint64_t checkTime = ClockP_getTimeUsec(); DebugP_log("FFTLIB_STATUS = %d TIME = %d usecs\r\n", status, (uint32_t)(checkTime-curTime)); checkTime = ClockP_getTimeUsec(); int i = 0; status = FFTLIB_fft1dBatched_i32fc_c32fc_o32fc_init(pX, &bufParamsData, pW, &bufParamsTw, pY, &bufParamsData, numPoints, numChannels, pBlock); status = FFTLIB_fft1dBatched_i32fc_c32fc_o32fc_kernel(pX, &bufParamsData, pW, &bufParamsTw, pY, &bufParamsData, numPoints, numChannels, pBlock); uint64_t fftTime = ClockP_getTimeUsec(); DebugP_log("FFTLIB_STATUS = %d TIME = %d usecs\r\n", status, (uint32_t)(fftTime-checkTime)); curTime = ClockP_getTimeUsec() - curTime; /* get time and calculate diff, ClockP returns 64b value so there wont be overflow here */ DebugP_log("FFT FLOAT32 ... DONE (Measured time = %d usecs) !!!\r\n", (uint32_t)curTime);