This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/XTCIEVMK2LX: Input length of FFT limitation

Part Number: XTCIEVMK2LX

Tool/software: TI-RTOS

Hello, experts.

This is related to my previous question and just came into my mind. ;) By the document, the input length of the FFT function, DSPF_sp_fftSPXSP(), is limited to 131072. Is there any special reason for that? Hardware restrictions? or other reason? I need to know this because I'm planning to implement my own FFT function using the function or modify it to meet my purpose (very long input length) as suggested in the previous answer. If there was any critical problem to make the input length much longer than provided by the library function, I need to reconsider my plan.

Thank you very much!

  • Hi,

    The RTOS team have been notified. Their feedback will be posted directly here.

    Best Regards,
    Yordan
  • Hello,


    I am not familiar with this function (DSPF_sp_fftSPXSP()).  Perhaps you meant something along the lines of DSP_fft32x32() in the DSPlib.  But these are software functions implementing FFTs/iFFTs.  Have you investigated the two hardware FFTC blocks in K2L?  They can perform up to 8192point FFTs/iFFTs by simply sending them a QMSS/pktDMA descriptor.

    The FFTC user guide can be found here: 

    And the datasheet is here:

  • Hello,

    Thank you for your attention. I'm not quite sure about the difference between 'input length' and 'point' but I have about 20M-length data to FFT, and the software implementation of FFT from DSPLIB has input length limit up to 131072, which is only 0.13M. The hardware FFT you mentioned about has ability to perform FFT up to 8192 points, can I use the HW FFTC block in my case? If the 'input length' and 'point' refer to the same thing, then I think I can't.

    Thank you! :)

  • Typically an FFT is run on a certain amount of data, either real or complex data points, and usually a power of 2. FFT sizes of 256 points, 512 points, 1024 points, etc. are common. Each point is a real or complex value, normally 32-bits, but sometimes 16 or even 8. So the data length is usually 4x the number of points - e.g. a 2048 point FFT will have 8KB input data. The K2L FFTC hardware accelerators can handle up to a 8192 point FFT. If you need a larger FFT then you either need to do it via software, or figure out if it is possible to call the FFTC multiple times and stitch the results together. From your descriptions it is not clear what size FFT you actually need.

    The DSPLib limitation of 128KB (131072) appears to be an assembly language limitation, to allow for fast optimized loop unrolling.
  • Once a monolithic FFT's working set is too large to fit in cache, performance drops very sharply. For C66x parts, L2 cache tops out at 1 MB, which for a complex float FFT works out to 128K input or output points plus 128K twiddle factors. (Some SoCs with C66x cores have less L2 cache, which means they should limit themselves to monolithic FFTs with N=64K or even less.)

    TI's FFTLIB supports larger FFTs. It breaks the overall FFT into smaller chunks, and uses the hardware DMA engines to transfer data around while the DSP core is working on one of those chunks, and can even split the FFT workload across multiple cores. It is well worth a look before you write your own FFT function.
  • Thank you for your kind answer, db_woodall.

    So, you meant that the input length is a total length of input data in byte, didn't you? Suppose I have two points in float data type, then the length is 2 X 4 bytes = 8 bytes? If I use the 8192-point FFT on float data, then the input data length is up to 32KB?

    In the API reference of DSPLIB, the description of DSPF_sp_fftSPxSP() is as follows:

    -----------------------------------------------------------------

    void DSPF_sp_fftSPxSP ( int N, float * ptr_x, float * ptr_w, float * ptr_y, unsigned char * brev, int n_min, int offset, int n_max ) 

    Parameters:
    - N length of FFT in complex samples
    - ptr_x pointer to complex data input
    - ptr_w pointer to complex twiddle factor
    - ptr_y pointer to complex output data
    - brev pointer to bit reverse table containing 64 entries
    - n_min should be 4 if N can be represented as Power of 4 else, n_min should be 2
    - offset index in complex samples of sub-fft from start of main fft
    - n_max size of main fft in complex samples

    Algorithm:
    DSPF_sp_fftSPxSP_cn.c is the natural C equivalent of the optimized linear assembly code without restrictions. Note that the linear assembly code is optimized and restrictions may apply.

    Assumptions:
    - N needs to be power of 2
    - 8 <= N <= 131072, 32 <= N for linear assembly implementation
    - Arrays pointed by ptr_x, ptr_w, and ptr_y should not overlap
    - Arrays pointed by ptr_x, ptr_w, and ptr_y should align on the double words boundary

    Implementation Notes:
    - Interruptibility: The code is interrupt-tolerant but not interruptible.
    - Endian Support: The code supports both big and little endian modes.

    -----------------------------------------------------------------

    The parameter, N, is the length of FFT in the description. It seems, however, that N is used as a number of point in the example project of DSPLIB, FFT_SP_Example_66_LE_ELF (fft_example_sp.c) as follows:

    /* Number of samples for which FFT needs to be calculated */
    #define N 256
    
    ... 
    
    /* Align the tables that we have to use */
    #pragma DATA_ALIGN(x_ref, 8);
    float   x_ref [2*N];
    
    #pragma DATA_ALIGN(x_sp, 8);
    float   x_sp [2*N];
    #pragma DATA_ALIGN(y_sp, 8);
    float   y_sp [2*N];
    #pragma DATA_ALIGN(w_sp, 8);
    float   w_sp [2*N];
    
    ...
    
    void main () {
        /* Generate the input data */
        generateInput (NUM_SIN_WAVES);
        
        /* Genarate twiddle factors */
        gen_twiddle_fft_sp(w_sp, N);
    
        /* Call FFT routine */
        DSPF_sp_fftSPxSP(N, x_sp, w_sp, y_sp, brev, 4, 0, N);
    
        /* Call the test code to seperate the real and imaginary data */
        seperateRealImg ();
    }

    The input/ouput arrays, x_sp, w_sp, and y_sp, have 2*N values because the real value and imaginary value are interleaved. So we can say it's N point data. The input length, in this case, seems to mean the number of point, not the total data size in byte by the document. In other words, the input length limitation of DSPF_sp_fftSPxSP() is 131072 points, which is 131072 * 4 bytes = total 512KB.

    I think the number of points in the context of FFTC and the input length in the context of the software FFT denote the same stuff. Is my guess right?

    Anyway, I could get some feel from your answer. Thank you very much! :)

  • Thank you for giving me the good insight into the internals! :)
  • If there's nothing else, let's close this ticket.
  • It's still vague about whether the input length limitation of the DSPLIB functions (8 <= N <= 131072) denotes the number of points or total size in Byte, but anyway I think I can figure it out by some experiments by my self, so, yes let's close this ticket! ;)

    Thank you very much.
  • N is the number of points, not the number of bytes.
  • Thank you for making sure of it! Now it's clear. :)