Tool/software:
Hi TI Team,
Due to a suggestion from a TI agent, I have removed the FFT-related part from my previous post (linked below) to keep the discussion focused:
Now, I'm creating this separate post to ask specifically about the issue I'm facing with FFT processing performance.
I'm using the AM62x platform and executing an FFT calculation on 128 real double-precision values. The function I use is fft_calculation_128(x, mag, phase, 3)
, and I measure execution time with ClockP_getTimeUsec()
. The output is consistently around 25 ms, which seems unusually high:
#include <stdio.h>
#include <math.h>
#include <drivers/soc/am62x/soc.h>
#include <kernel/dpl/DebugP.h>
#include <kernel/dpl/TimerP.h>
#include <kernel/dpl/ClockP.h>
#include <drivers/ipc_notify.h>
#include <drivers/ipc_rpmsg.h>
#include </drivers/mcspi/v0/cslr_mcspi.h>
#include "ti_drivers_config.h"
#include "ti_drivers_open_close.h"
#include "ti_board_open_close.h"
#include "ti_dpl_config.h"
#include "/ADE9000/ADE9000RegMap.h"
#include "/ADE9000/ADE9000.h"
#include "s_Pripherals_Config.h"
#include <kernel/dpl/HwiP.h>
#include "Calibration.h"
#include "/FFT/fft_opt.h"
ClockP_Params clockParams;
ClockP_Object clockObj;
void CS_GPIO_Init()
{
M4F_GPIO->DIR &= ~(0x2000);
}
void Clock_Init()
{
ClockP_Params_init(&clockParams);
clockParams.timeout = ClockP_usecToTicks(1);
clockParams.period = clockParams.timeout;
clockParams.start = 1;
}
const double sine_half_cycle_LUT[128] = {
0.000000, 0.024541, 0.049068, 0.073565, 0.098017, 0.122411, 0.146730, 0.170962,
0.195090, 0.219101, 0.242980, 0.266713, 0.290285, 0.313682, 0.336890, 0.359895,
0.382683, 0.405241, 0.427555, 0.449611, 0.471397, 0.492898, 0.514103, 0.534998,
0.555570, 0.575808, 0.595699, 0.615232, 0.634393, 0.653173, 0.671559, 0.689541,
0.707107, 0.724247, 0.740951, 0.757209, 0.773010, 0.788346, 0.803208, 0.817585,
0.831470, 0.844854, 0.857729, 0.870087, 0.881921, 0.893224, 0.903989, 0.914210,
0.923880, 0.932993, 0.941544, 0.949528, 0.956940, 0.963776, 0.970031, 0.975702,
0.980785, 0.985278, 0.989177, 0.992480, 0.995185, 0.997290, 0.998795, 0.999699,
1.000000, 0.999699, 0.998795, 0.997290, 0.995185, 0.992480, 0.989177, 0.985278,
0.980785, 0.975702, 0.970031, 0.963776, 0.956940, 0.949528, 0.941544, 0.932993,
0.923880, 0.914210, 0.903989, 0.893224, 0.881921, 0.870087, 0.857729, 0.844854,
0.831470, 0.817585, 0.803208, 0.788346, 0.773010, 0.757209, 0.740951, 0.724247,
0.707107, 0.689541, 0.671559, 0.653173, 0.634393, 0.615232, 0.595699, 0.575808,
0.555570, 0.534998, 0.514103, 0.492898, 0.471397, 0.449611, 0.427555, 0.405241,
0.382683, 0.359895, 0.336890, 0.313682, 0.290285, 0.266713, 0.242980, 0.219101,
0.195090, 0.170962, 0.146730, 0.122411, 0.098017, 0.073565, 0.049068, 0.024541
};
void hello_world_main(void *args)
{
double test = 1.0;
const uint8_t hamonics = 3;
double x[256];
double mag[hamonics];
double phase[hamonics];
/* Open drivers to open the UART driver for console */
Drivers_open();
Board_driversOpen();
Clock_Init();
CS_GPIO_Init();
Timer_Init();
for(uint16_t i =0; i < 128; i++)
{
x[i * 2] = sine_half_cycle_LUT[i];
x[i * 2 + 1] = 0.0;
}
DebugP_log("test is %f\r\n",test);
uint16_t strTime = ClockP_getTimeUsec();
fft_calculation_128(x, mag, phase, 3);
uint16_t endTime = ClockP_getTimeUsec();
for(uint8_t i = 0; i < 128; i ++)
{
DebugP_log("fft amounts of x[%d] is %f + %f j\r\n ",i,x[i * 2],x[i*2 + 1]);
ClockP_usleep(100);
}
DebugP_log("the time spend of fft calculation is %d us\r\n",endTime - strTime);
while(1)
{
}
Board_driversClose();
Drivers_close();
}
fft amounts of x[0] is 81.483242 + 0.000000 j
fft amounts of x[1] is -27.166536 + 0.000000 j
fft amounts of x[2] is -5.436583 + 0.000000 j
fft amounts of x[3] is -2.332302 + 0.000000 j
fft amounts of x[4] is -1.297546 + 0.000000 j
fft amounts of x[5] is -0.827208 + 0.000000 j
.
.
.
.
.
the time spend of fft calculation is 25271 us
This high latency is causing issues for real-time processing in my application.
I would like to ask:
-
Could this performance issue be related to system clock configuration or CPU frequency not being properly set?
-
Is the FPU (Floating Point Unit) enabled by default on AM62x? If not, how can I confirm and enable it properly in my environment?
I’d appreciate any guidance to improve the FFT performance on this platform.
Best regards,
Soheil