This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RTOS/AM5728: FFTLIB from Processor SDK

Part Number: AM5728
Other Parts Discussed in Thread: FFTLIB,

Tool/software: TI-RTOS

I am trying to use the DSPs via OpenCL to do an IFFT. I've had success using DSPLIB but I need to do a non-power of two FFT now (50 elements to be exact). I read that FFTLIB has non-power of two FFTs and indeed the fftlib_c66x_2_0_0_2 provides the following documentation (I've only included a short extract to give some context):

int ifft_sp_1d_c2r_batch_direct ( fft_param_u  u,
void *  edmaState  
)

Parameters:

N  = IFFT size
M  = Power of 2 IFFT size, if Bluestein algorithm is used

Assumptions:Batch size is at least 4.
N is a positive value.

Looks great, but there was no ae66 library for the whole project, but I did find fftlib_3_1_0_0 in ti-processor-sdk-linux-am57xx-evm-05.01.00.11 which seemed great because it's much more recent and included the library. Problem is, the function documentation explains that non-power of two is more or less not supported, suggesting a rollback in features? Here is another documentation extract from fftlib_3_1_0_0:

void ifft_sp_plan_1d_c2r_batch ( int  N,
int  mode,
fft_callout_t  fxns,
fft_plan_t *  p,
float *  in,
float *  out,
float *  tw,
float *  work 
)

Parameters

N = IFFT size

Assumptions:

Batch size is at least 8.
N is a positive value.
N is multiple of 8.

So my concern is: did the IFFT in 2_0_0_2 version not work for non-powers of 2? Or was there a feature rollback?

  • Louis,

    As you may know FFTLIB Is based of FFTW library so the newer versions are optimized versions of the FFTW that was made available. The only issue that I am aware of with iFFT during this update is that fftlib ifft function scales output with 1/n which does not match FFTW implementation.

    If you are interested in the implementation from earlier library, you could either choose to use the older library version or simply extract the source of the earlier function to use with the OpenCL implementation. It is also possible that the option to not support the non-power of two is result of the optimization tthat was done of rthe function as part of the library integration on TI DSP.

    Regards,
    Rahul


  • Thanks for the quick response, Rahul.


    So just to recap, we are unsure why the non-power-of-two support was dropped, but it was most likely due to following the lead of FFTW or a TI DSP optimization. And so, as far as we know, the older version does not have any known bugs that would've caused the feature to be dropped?

    Best,

    Louis

  • Part Number: AM5728

    Tool/software: Code Composer Studio

    Hello,

    I have used FFTlib library in my project. At first it seems there is not problem and no errors, however after 7th times that I have run my program, the program hangs in 

    “fft_execute” function. Tracking down the issue, it leads to “EdmaMgr_alloc” in “void *fft_assign_edma_resources(void)”. 

    As mentioned in relevant question "https://e2e.ti.com/support/processors/f/791/p/754731/2800231 ", Am5728 uses only one EDMA DSP. Accordingly,I have changed the following in configuration file “fft_c6678_config.c”.

    #define EDMA_MGR_NUM_EDMA_INSTANCES 1

    #define NUM_EDMA_INSTANCES 1

    Moreover, I have changed the "Global Register Region of CC Registers". 

    #define DSP1_EDMA3_CC_BASE_ADDR (0x01D10000)
    #define DSP1_EDMA3_TC0_BASE_ADDR (0x01D05000)
    #define DSP1_EDMA3_TC1_BASE_ADDR (0x01D06000)

    But still my program hangs exactly after the 7th times.

    Could you please help me with this issue?

    Best regards,

    Parian Golchin

  • Parian,

    We need some more information on how you are running this test code. Are you enabling the DSP cache, where is the code placed. Where is the DSP PC when the hang occurs? Is it in the EDMA code or in the FFT code? Are EDMA transfers bigger than 32KB ?

    Please check for existing issues like these that you may be running into :
    e2e.ti.com/.../518566

    Regards,
    Rahul
  • Dear Rahul,

    Are you enabling the DSP cache, where is the code placed? I do not know how to enable or not to enable it or if it is or it is not.
    Where is the DSP PC when the hang occurs? it hangs in “EdmaMgr_alloc” function.
    Is it in the EDMA code or in the FFT code? EDMA code
    Are EDMA transfers bigger than 32KB ? No, It is not.

    Just one thing I have noticed to mention that I get two warning before running my program. the warning are the following
    warning #10247-D: creating output section ".ddr_mem" without a SECTIONS specification
    warning #10247-D: creating output section ".ll2_mem" without a SECTIONS specification

    Thank you very much,
    Best regards,
    Parian Golchin
  • Part Number: AM5728

    Tool/software: TI-RTOS

    Hi,

    I build and run the fft_sp_1d_r2c FFTLIB example (C:\ti\fftlib_c66x_2_0_0_2\packages\ti\fftlib\src\fft_sp_1d_r2c\k1\fft_sp_1d_r2c_k1_66_LE_ELF), but I encountered an issue with EDMA3.

    The console output:

    EdmaMgr_alloc() failed  (0)

    Please help me.

    --------------------

    CCS Version: 8.2.0.00007

    EDMA3 Version: 2.12.3

    Framework Component Version: 2.12.3

    PDK Version: 1.0.13

    FFTLIB Version: fftlib_c66x_2_0_0_2

     

    Best regards, Omid

  • I have reached out to the FFTLIB maintainer for his comment and will post a response by the end of this week.
  • I have reached out to FFTLIB maintainer for their comment on this feature and will get back to you by the end of this week.

    Regards,
    Rahul
  • Thanks, Rahul! I'm still very interested in the response.

  • Omid,

    It looks like you are running into the same issue as described by Parian so I am merging the two threads on the same topic.

    I will check with our system test team and confirm that they have run this test case.

    Regards,
    Rahul
  • Omid,

    It appears that fftlib that was released in 2014 was validated on C66x and K2H platforms.

    For AM572x FFTLIB is only supported from Processor SDK Linux where the FFTLIB functionality can be offloaded to the DSP from ARM Linux using OpenCL as mentioned here:
    wiki.tiprocessors.com/.../Processor_SDK_Libraries

    the version of FFTLIB for DSP has been upgraded to 3.1.0.0 and should have the required EDMA porting done for the DSP on AM572x.

    Regards,
    Rahul
  • Hi Rahul,

    Thank you for your reply. I will do so.

    Best regards, Omid
  • Hi Rahul,

    Thank you very much for your fast response.

    I have tried to use FFTlib  3.1.0.0. Now that problem solved.

    However now I have new issue "[core 0] Memory allocation error!".

    when I tracked the issue inside "fft_omp_sp_2d_r2c_ecpy" function, it allocates memory for "data_wLocal, workbuf_wLocal, workbuf_tLocal" by calling "lib_smem_falloc" function. Please see below

    data_wLocal = (float*)lib_smem_falloc (fft_mem_handle, 4*N*FFT_OMP_SP_2D_R2C_NUMOFLINEBUFS*sizeof(float), 8);
    workbuf_wLocal = (float*)lib_smem_falloc (fft_mem_handle, 4*N*FFT_OMP_SP_2D_R2C_NUMOFLINEBUFS*sizeof(float), 8);
    workbuf_tLocal = (float*)lib_smem_falloc (fft_mem_handle, 2*N*FFT_OMP_SP_2D_R2C_NUMOFLINEBUFS*sizeof(float), 8);
    but it really did not and they are all NULL.
    I would greatly appreciate your help!
    Thank you!
    Best regards,
    Parian Golchin
  • Parian,

    The issue with memory allocation is a known issue which has been reported here:
    e2e.ti.com/.../768577

    It appears when the BIOS and EDMA and dependent components were updated the memory requirement for the FFTLIB were increased so you are running into issue that we are tracking in our bug system. This library is in maintenance so please expect some delays.

    I will post a response if I get an intermediate fix in the interim so you don`t need to wait for a new release.

    Regards.
    Rahul
  • Hi Rahul,

    Thanks for your reply.

    I will wait for an intermediate fix.

    Best regards,

    Parian