Other Parts Discussed in Thread: FFTLIB
Tool/software:
Hi Team,
Could you please provide the API details along with an example code from the SDK demonstarting its usage?
Regards,
Betsy Varughese
This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
The following sections provide details on the FFTLIB(Fast Fourier Transform Library) Kernels, along with build and execution guidelines, a description of the common API's and parameter definitions, and an example code demonstrating the use of the these APIs.
FFTLIB Kernels : The existing SDK includes implementation of six kernels, listed as follows:
Note: All of the above implementations are based on a 1D data pattern, with the data size required to be a power of 2. These kernels also requires application to provide a pre-generated twiddle factor along with the input data.
The following naming convention is used in FFTLIB to identify different FFT kernel variants. Each kernel name encodes its functionality, data dimensionality , processing mode, and data access patterns. The figure below describes the nomenclature:

For instructions on building and running the FFTLIB kernels, please refer to the guidelines provided in the FAQ : Build and Run Instructions: mathlib, dsplib and fftlib
Input and Output Parameters Used :
Kernel API's Definitions:
tw_gen () : The function used to pre-calculate the twiddle facors.
The example code for FFTLIB is provided below.
/* Included header files required for fftlib */
#include <fftlib.h>
#include <c7x.h>
#include "common/FFTLIB_bufParams.h"
#include "common/FFTLIB_types.h"
#include <cmath>
#include <math.h>
#include "common/TI_memory.h"
#include <iostream>
#include <cstring>
using namespace std;
uint8_t FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock[FFTLIB_FFT1D_I16SC_C16SC_O16SC_PBLOCK_SIZE];
static int16_t staticRefOutput4[] = {
79, -619, 373, -719, -638, 213, 520, -711, 370, -615, -225, 264, 452, 1007, 5, -521, 1068, 180, 1451, 600, 150, 303,
539, -566, -343, -890, 284, -710, -261, -789, -278, -296, 270, 1036, -1357, -143, 169, -136, -35, -922, -617, -426, 197, 8,
97, 866, 894, -56, -491, 567, -484, -383, -1229, -1178, -268, -37, -622, 419, 138, -358, -85, -217, 242, -98, 297, -479,
139, -185, -830, -421, 202, 585, 42, -661, -359, 634, -464, -275, 133, -651, -1088, 1438, -153, 106, 380, 87, -725, -156,
-1361, -738, 98, -272, -1301, 595, -14, -104, -634, -770, -347, -125, -733, -1228, 157, -276, 125, -30, -1433, -466, -1001, 218,
312, -540, 1099, 31, 138, 369, 539, 1080, 706, -381, -770, -1131, 212, -220, -93, 1379, -550, -550};
static int16_t float2short (FFTLIB_D64 x)
{
x = floor (0.5 + x); // Explicit rounding to integer //
if (x >= 32767.0)
return 32767;
if (x <= -32768.0)
return -32768;
return (int16_t) x;
}
void tw_gen (int16_t *pW, uint32_t numPoints)
{
int32_t i, j, k, t;
const FFTLIB_D64 PI = 3.141592654;
FFTLIB_D64 twF2sScale = 32767.5; /* Scale twiddle factors (max abs value of
* 1) to use full capacity of int16_t */
t = numPoints >> 2;
for (j = 1, k = 0; j <= numPoints >> 2; j = j << 2) {
for (i = 0; i < numPoints >> 2; i += j) {
/* TODO: Big endian requires different format of Twiddle factors? */
pW[k] = float2short (twF2sScale * cos (2 * PI * i / numPoints));
pW[k + 1] = float2short (twF2sScale * (-sin (2 * PI * i / numPoints)));
pW[k + 2 * t] =
float2short (twF2sScale * cos (4 * PI * i / numPoints));
pW[k + 2 * t + 1] =
float2short (twF2sScale * (-sin (4 * PI * i / numPoints)));
pW[k + 4 * t] =
float2short (twF2sScale * cos (6 * PI * i / numPoints));
pW[k + 4 * t + 1] =
float2short (twF2sScale * (-sin (6 * PI * i / numPoints)));
k += 2;
}
k += (t) *4;
t = t >> 2;
}
}
int16_t pX[] = {
-108, -126, 109, 86, 56, 62, -46, 8, -88, -101, -78, -127, 67, -33, -12, 97, 117, 109, 32, 91, -44, -118, 117, 72, 20, -43,
-66, -77, 82, 19, 31, -84, -22, -115, -46, -114, -30, -73, -115, 25, 118, 25, -18, 92, 61, 90, 67, -39, -21, 88, 11, 85,
-72, 86, 34, -107, -11, -113, -54, 125, -37, 3, 86, 38, -124, 89, 12, 90, 9, -13, -12, 22, 14, 17, -109, 84, 22, -16,
2, -79, 128, 76, 52, -53, -38, -105, 113, 106, 63, -110, -85, -68, -50, -82, 71, -124, 93, -16, 42, -122, 78, -70, -64, -45,
-118, 57, -127, 124, -17, -65, 117, -40, 12, 68, -70, -76, 6, -18, -96, -3, -70, -113, -111, 51, 92, -8, 104, -108};
const uint32_t pShift[] = {0, 0, 0};
const uint32_t numPoints = 64;
const uint32_t numShifts = 3;
int main(){
int16_t *pY, *pW;
FFTLIB_STATUS status = FFTLIB_SUCCESS;
pY = (int16_t *) TI_memalign (64, numPoints * 2 * sizeof (int16_t));
pW = (int16_t *) TI_memalign (64, numPoints * 2 * sizeof (int16_t));
FFTLIB_bufParams1D_t bufParamsData;
FFTLIB_bufParams1D_t bufParamsShift;
bufParamsData.dim_x = numPoints * 2;
bufParamsData.data_type = FFTLIB_INT16;
bufParamsShift.dim_x = numShifts;
bufParamsShift.data_type = FFTLIB_UINT32;
tw_gen (pW, numPoints);
status = FFTLIB_fft1d_i16sc_c16sc_o16sc_checkParams ((int16_t *) pX, &bufParamsData, (int16_t *) pW, &bufParamsData, (int16_t *) pY, &bufParamsData, (uint32_t *) pShift, &bufParamsShift, FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock);
if(status == FFTLIB_SUCCESS){
status = FFTLIB_fft1d_i16sc_c16sc_o16sc_init ( (int16_t *) pX, &bufParamsData, (int16_t *) pW, &bufParamsData, (int16_t *) pY, &bufParamsData, (uint32_t *) pShift, &bufParamsShift, FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock);
}
if(status == FFTLIB_SUCCESS){
status = FFTLIB_fft1d_i16sc_c16sc_o16sc_kernel ((int16_t *) pX, &bufParamsData, (int16_t *) pW, &bufParamsData, (int16_t *) pY, &bufParamsData, (uint32_t *) pShift, &bufParamsShift, FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock);
}
size_t lengthBytes = (numPoints * 2) * sizeof(int16_t);
if (memcmp(pY, staticRefOutput4, lengthBytes) == 0) {
printf("TEST PASS(pY & staticRefOutput4 matched)\n");
} else {
printf("TEST FAIL(pY & staticRefOutput4 mismatch)\n");
}
return 0;
}
Note : Enure that the include and library paths for FFTLIB from SDK are added in CCS.
We have also published a dedicated FAQ (e2e.ti.com/.../faq-tda4vh-q1-how-to-enable-cache-and-mmu-in-a-standalone-c7x-code-to-maximize-performance) on enabling cache and MMU in standalone C7x code to maximize performance, which is equally applicable for FFTlib as well. For performance measurement, we typically use the __TSC counter.
By default, FFTLIB (from d.c located at: "ti-processor-sdk-rtos-j784s4-evm-11_00_00_06/fftlib/test/fft_c7x/FFTLIB_fft1d_i16sc_c16sc_o16sc/FFTLIB_fft1d_i16sc_c16sc_o16sc_d.c") has these features enabled and accounted for in the performance calculations.
We are directly using the function definitions provided in the SDK and the corresponding linker command file can be found at: "ti-processor-sdk-rtos-j784s4-evm-11_00_00_06/fftlib/cmake/linkers/C7120/lnk.cmd" . For each "SOC" corresponding linker script is available at "ti-processor-sdk-rtos-j784s4-evm-11_00_00_06/fftlib/cmake/linkers" Path (Screenshot Attached).

Regards,
Betsy Varughese