[FAQ] TDA4VH-Q1: How to Use FFTLIB?

Betsy Varughese

Expert 5150 points

Part Number: TDA4VH-Q1
Other Parts Discussed in Thread: FFTLIB

Tool/software:

Hi Team,

Could you please provide the API details along with an example code from the SDK demonstarting its usage?

Regards,

Betsy Varughese

4 months ago

+1 Betsy Varughese 4 months ago

TI__Expert 5150 points

The following sections provide details on the FFTLIB(Fast Fourier Transform Library) Kernels, along with build and execution guidelines, a description of the common API's and parameter definitions, and an example code demonstrating the use of the these APIs.

FFTLIB Kernels : The existing SDK includes implementation of six kernels, listed as follows:

- FFTLIB_fft1dBatched_i16sc_c16sc_o16sc :- Kernel for computating batched 16-bit integer FFT
- FFTLIB_fft1dBatched_i32fc_c32fc_o32cf :- Kernel for computating 32-bit floating-point batched FFT
- FFTLIB_fft1d_i16sc_c16sc_o16sc :- Kernel for computating 16-bit integer FFT
- FFTLIB_fft1d_i32f_c32fc_o32fc :- Kernel for computating 32-bit floating-point real to complex FFT
- FFTLIB_fft1d_i32fc_c32fc_o32fc :- Kernel for computating 32-bit floating point FFT
- FFTLIB_ifft1d_i32fc_c32fc_o32fc :- Kernel for computating 32-bit floating point inverse FFT

Note: All of the above implementations are based on a 1D data pattern, with the data size required to be a power of 2. These kernels also requires application to provide a pre-generated twiddle factor along with the input data.

The following naming convention is used in FFTLIB to identify different FFT kernel variants. Each kernel name encodes its functionality, data dimensionality , processing mode, and data access patterns. The figure below describes the nomenclature:

For instructions on building and running the FFTLIB kernels, please refer to the guidelines provided in the FAQ : Build and Run Instructions: mathlib, dsplib and fftlib

Input and Output Parameters Used :

- [in] pX : Pointer to buffer with input data
- [in] bufParamsX : Pointer to the structure containing dimensional information of input buffer
- [in] pW : Pointer to buffer with twiddle factors
- [in] bufParamsW : Pointer to the structure containing dimensional information of twiddle factor buffer
- [out] pY : Pointer to buffer with output data
- [in] bufParamsY : Pointer to the structure containing dimensional
- [in] pBlock : Pointer to the buffer that will hold the streaming engine parameters

Kernel API's Definitions:

- <kernel_name>_checkParams() : This function checks the validity of the parameters passed to <kernel_name>_init and <kernel_name>_kernel functions. Return Status value indicating success or failure.
- <kernel_name>_init() : This function should be called before the <kernel_name>_kernel function is called. This function takes care of any one-time operations such as setting up the configuration for the streaming engine.
- <kernel_name>_kernel() : This function is the main kernel compute function.
- tw_gen () : The function used to pre-calculate the twiddle facors.
- <KERNAL_NAME>_PBLOCK_SIZE : The macro defining the array size for storing SE/SA parameters.

The example code for FFTLIB is provided below.

/* Included header files required for fftlib */
#include <fftlib.h>
#include <c7x.h>
#include "common/FFTLIB_bufParams.h"
#include "common/FFTLIB_types.h"
#include <cmath>
#include <math.h>
#include "common/TI_memory.h"
#include <iostream>
#include <cstring>

using namespace std;

uint8_t FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock[FFTLIB_FFT1D_I16SC_C16SC_O16SC_PBLOCK_SIZE];

static int16_t staticRefOutput4[] = {
    79,    -619, 373,  -719, -638,  213,  520,  -711, 370,   -615,  -225, 264,   452,   1007,  5,     -521, 1068, 180,  1451,  600,  150,   303,
    539,   -566, -343, -890, 284,   -710, -261, -789, -278,  -296,  270,  1036,  -1357, -143,  169,   -136, -35,  -922, -617,  -426, 197,   8,
    97,    866,  894,  -56,  -491,  567,  -484, -383, -1229, -1178, -268, -37,   -622,  419,   138,   -358, -85,  -217, 242,   -98,  297,   -479,
    139,   -185, -830, -421, 202,   585,  42,   -661, -359,  634,   -464, -275,  133,   -651,  -1088, 1438, -153, 106,  380,   87,   -725,  -156,
    -1361, -738, 98,   -272, -1301, 595,  -14,  -104, -634,  -770,  -347, -125,  -733,  -1228, 157,   -276, 125,  -30,  -1433, -466, -1001, 218,
    312,   -540, 1099, 31,   138,   369,  539,  1080, 706,   -381,  -770, -1131, 212,   -220,  -93,   1379, -550, -550};

static int16_t float2short (FFTLIB_D64 x)
{
   x = floor (0.5 + x); // Explicit rounding to integer //
   if (x >= 32767.0)
      return 32767;
   if (x <= -32768.0)
      return -32768;
   return (int16_t) x;
}

void tw_gen (int16_t *pW, uint32_t numPoints)
{
   int32_t          i, j, k, t;
   const FFTLIB_D64 PI   = 3.141592654;
   FFTLIB_D64 twF2sScale = 32767.5; /* Scale twiddle factors (max abs value of
                                     * 1) to use full capacity of int16_t */

   t = numPoints >> 2;
   for (j = 1, k = 0; j <= numPoints >> 2; j = j << 2) {
      for (i = 0; i < numPoints >> 2; i += j) {
         /* TODO: Big endian requires different format of Twiddle factors? */
         pW[k]     = float2short (twF2sScale * cos (2 * PI * i / numPoints));
         pW[k + 1] = float2short (twF2sScale * (-sin (2 * PI * i / numPoints)));
         pW[k + 2 * t] =
             float2short (twF2sScale * cos (4 * PI * i / numPoints));
         pW[k + 2 * t + 1] =
             float2short (twF2sScale * (-sin (4 * PI * i / numPoints)));
         pW[k + 4 * t] =
             float2short (twF2sScale * cos (6 * PI * i / numPoints));
         pW[k + 4 * t + 1] =
             float2short (twF2sScale * (-sin (6 * PI * i / numPoints)));
         k += 2;
      }
      k += (t) *4;
      t = t >> 2;
   }
}


int16_t pX[] = {
      -108, -126, 109,  86,   56,  62,   -46, 8,    -88, -101, -78, -127, 67,   -33, -12, 97,  117, 109,  32,   91,  -44, -118, 117,  72,  20,  -43,
      -66,  -77,  82,   19,   31,  -84,  -22, -115, -46, -114, -30, -73,  -115, 25,  118, 25,  -18, 92,   61,   90,  67,  -39,  -21,  88,  11,  85,
      -72,  86,   34,   -107, -11, -113, -54, 125,  -37, 3,    86,  38,   -124, 89,  12,  90,  9,   -13,  -12,  22,  14,  17,   -109, 84,  22,  -16,
      2,    -79,  128,  76,   52,  -53,  -38, -105, 113, 106,  63,  -110, -85,  -68, -50, -82, 71,  -124, 93,   -16, 42,  -122, 78,   -70, -64, -45,
      -118, 57,   -127, 124,  -17, -65,  117, -40,  12,  68,   -70, -76,  6,    -18, -96, -3,  -70, -113, -111, 51,  92,  -8,   104,  -108};

const uint32_t pShift[] = {0, 0, 0};

const uint32_t numPoints = 64;

const uint32_t numShifts = 3;

int main(){
    int16_t *pY, *pW;

    FFTLIB_STATUS status = FFTLIB_SUCCESS;

    pY     = (int16_t *) TI_memalign (64, numPoints * 2 * sizeof (int16_t));
    pW     = (int16_t *) TI_memalign (64, numPoints * 2 * sizeof (int16_t));

    FFTLIB_bufParams1D_t bufParamsData;
    FFTLIB_bufParams1D_t bufParamsShift;

    bufParamsData.dim_x     = numPoints * 2;
    bufParamsData.data_type = FFTLIB_INT16;

    bufParamsShift.dim_x     = numShifts;
    bufParamsShift.data_type = FFTLIB_UINT32;

    tw_gen (pW, numPoints);

    status = FFTLIB_fft1d_i16sc_c16sc_o16sc_checkParams ((int16_t *) pX, &bufParamsData, (int16_t *) pW, &bufParamsData, (int16_t *) pY, &bufParamsData, (uint32_t *) pShift, &bufParamsShift, FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock);

    if(status == FFTLIB_SUCCESS){
        status = FFTLIB_fft1d_i16sc_c16sc_o16sc_init ( (int16_t *) pX, &bufParamsData, (int16_t *) pW, &bufParamsData, (int16_t *) pY, &bufParamsData, (uint32_t *) pShift, &bufParamsShift, FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock);

    }
    
    if(status == FFTLIB_SUCCESS){
        status = FFTLIB_fft1d_i16sc_c16sc_o16sc_kernel ((int16_t *) pX, &bufParamsData, (int16_t *) pW, &bufParamsData, (int16_t *) pY, &bufParamsData, (uint32_t *) pShift, &bufParamsShift, FFTLIB_fft1d_i16sc_c16sc_o16sc_pBlock);
    }
    
    size_t lengthBytes = (numPoints * 2) * sizeof(int16_t);

    if (memcmp(pY, staticRefOutput4, lengthBytes) == 0) {
        printf("TEST PASS(pY & staticRefOutput4 matched)\n");
    } else {
        printf("TEST FAIL(pY & staticRefOutput4 mismatch)\n");
    }


    return 0;

}

Note : Enure that the include and library paths for FFTLIB from SDK are added in CCS.

We have also published a dedicated FAQ (e2e.ti.com/.../faq-tda4vh-q1-how-to-enable-cache-and-mmu-in-a-standalone-c7x-code-to-maximize-performance) on enabling cache and MMU in standalone C7x code to maximize performance, which is equally applicable for FFTlib as well. For performance measurement, we typically use the __TSC counter.

By default, FFTLIB (from d.c located at: "ti-processor-sdk-rtos-j784s4-evm-11_00_00_06/fftlib/test/fft_c7x/FFTLIB_fft1d_i16sc_c16sc_o16sc/FFTLIB_fft1d_i16sc_c16sc_o16sc_d.c") has these features enabled and accounted for in the performance calculations.

We are directly using the function definitions provided in the SDK and the corresponding linker command file can be found at: "ti-processor-sdk-rtos-j784s4-evm-11_00_00_06/fftlib/cmake/linkers/C7120/lnk.cmd" . For each "SOC" corresponding linker script is available at "ti-processor-sdk-rtos-j784s4-evm-11_00_00_06/fftlib/cmake/linkers" Path (Screenshot Attached).

Regards,

Betsy Varughese

Processors

Processors forum

[FAQ] TDA4VH-Q1: How to Use FFTLIB?