AUDIO-AM275-EVM: FFTLIB: Board freeze observed on FFT & IFFT kernels' wrapper implementation test runs

Part Number: AUDIO-AM275-EVM
Other Parts Discussed in Thread: FFTLIB

Tool/software:

Hello Team,

I am implementing a standardized/generic FFT wrapper on top of FFTLIB (from AM275-FREERTOS-SDK v11.00.00.16) kernels FFTLIB_fft1d_i32f_c32fc_o32fc for real-to-complex FFT and FFTLIB_ifft1d_i32fc_c32fc_o32f for complex to real IFFT computations to be run on C7x core. I have followed the implemented test driver/harness in fftlib/test/fft_c7x/FFTLIB_fft1d_i32f_c32fc_o32fc/FFTLIB_fft1d_i32f_c32fc_o32fc_d.c and fftlib/test/fft_c7x/FFTLIB_ifft1d_i32fc_c32fc_o32f/FFTLIB_ifft1d_i32fc_c32fc_o32f_d.c as a reference for my wrappers. Accordingly, I have allocated the process buffers within my wrappers with 128-byte alignment as needed. However, I notice that even the buffers external to the library are needing a 128-byte alignment which seems like a subtle requirement. When I don't do that, i.e., if I allocate external buffers using simple malloc(), the execution stalls and my target board (AM275-EVM) freezes and goes to an irrecoverable/bad state and I need to do a power-cycle again. Please note that all these external, internal buffers are allocated to the heap. I have tried to reproduce this and tried out a few approaches to fix this in 4 separate CCS projects attached here in the .zip file:

 #    CCS Project Implementation Details Observation Implication
  1 ti-fft malloc() usage for external buffer, not aligned and minimal buffer copies used Board freezes during execution Desirable way 
  2 ti-fft-expt1 No malloc() usage, all external buffers and library process buffers are 12-byte aligned Successful execution Expectation too impractical
  3 ti-fft-expt2 Same as 1, except that the wrapper uses a couple more buffers Board freezes during execution Not desirable with added buffers
  4 ti-fft-expt3 Same as 3, except that the extra buffer allocation is managed based on the operation Board freezes during execution Still acceptable

It would be good to hear expert advise on this issue that I'm facing and looking forward to understand why this observation is seen and a possible solution to the problem.

PS: The projects contain copies of the dependent kernels for building the test applications. The dependencies may need to be adjusted for a successful build.

Regards,
Sreeekanth

ti-fftlib-test.zip

  • Hi Sreekanth,

    I am working on understanding the issue. Please expect response by tomorrow EOD.

    Thanks,

    Shreyansh

  • Thanks for the response . Just an update - I was able to do some more experiments and managed to come up with what I believe is a solution.

    I had to reconstruct manually the complex conjugate of the half spectrum and combine it with the half spectrum provided as the input and pass the full two sided-spectrum as an input to the IFFT kernel FFTLIB_ifft1d_i32fc_c32fc_o32f. It seems to work as expected now as you can see in the attached CCS project. Can you confirm that this is the right way to do it?

    ti-fft-expt4.zip

    However, I have assumed that the IFFT input data format is an array of the format <xn,jyn> for n = 0 to N-1. Can you confirm that my assumption is correct that the complex data is interleaved?

    Also, I was misled a bit by the FFTLIB_ifft1d_i32fc_c32fc_o32f kernel document which says:

    Kernel for computing 32-bit floating-point complex to real IFFT. The kernel performs IFFT of an N-point complex sequence using N/2-point complex IFFT to output a real sequence, which saves compute time

    which gives an impression that the kernel only needs N/2 points as input and the reconstruction is handled by the kernel internally.

    Can you also confirm the availability of a runtime stack or heap analyzer or any other runtime memory debug feature in CCS or CGT and point me to relevant document if any? I only see 4 - "Memory", "Memory Allocation", "Memory Map" and "Stack Usage" tools in CCS 20.2 (from View Menu) where the latter 3 seem to be static analyzers. "Memory" though can be used to validate data, it's not useful much for detecting heap and stack overflows.

    Any help in addressing the above few questions is greatly appreciated.

  • Hi Sreekanth,

    However, I have assumed that the IFFT input data format is an array of the format <xn,jyn> for n = 0 to N-1. Can you confirm that my assumption is correct that the complex data is interleaved?

    That's correct. The complex data is in the interleaved format.

    which gives an impression that the kernel only needs N/2 points as input and the reconstruction is handled by the kernel internally.

    You are right. I am raising a JIRA to get the documentation fixed, we are also planning to add the complex conjugate half spectrum generation to kernel in next release. 

    I am little bit out of context here whether or not this is the right solution. Please correct me If I am wrong, the reason you faced board crashing issue earlier was because you were passing only N/2 points to the kernel, if so, this is the correct solution.

    Can you also confirm the availability of a runtime stack or heap analyzer or any other runtime memory debug feature in CCS or CGT and point me to relevant document if any? I only see 4 - "Memory", "Memory Allocation", "Memory Map" and "Stack Usage" tools in CCS 20.2 (from View Menu) where the latter 3 seem to be static analyzers. "Memory" though can be used to validate data, it's not useful much for detecting heap and stack overflows.

    I will pass on this query to another expert who will assist you with this.

    Thanks,

    Shreyansh

  • Hi Shreyansh,

    Thanks for your responses.

    I am little bit out of context here whether or not this is the right solution. Please correct me If I am wrong, the reason you faced board crashing issue earlier was because you were passing only N/2 points to the kernel, if so, this is the correct solution.

    That's correct. The board crashes were seen when I was passing N/2 points to the kernel.

    I will pass on this query to another expert who will assist you with this.

    Sure, an expert validation would be great.

    I am raising a JIRA to get the documentation fixed, we are also planning to add the complex conjugate half spectrum generation to kernel in next release.

    This would be great. Maybe will help save some CPU cycles if streaming engine handles it.

    Regards,
    Sreekanth

  • Hi Sreekanth,
    I checked with the expert. Unfortunately, we don't have any tool for runtime stack detection.

    The method we typically use is to fill the stack with a known value (e.g., 0xDEAD) prior to running the code.  CCS can do this in a memory window.  One of the icons in the memory window has import Data option which you use to fill the entire stack.  Then when you run the code, anytime you halt you can inspect the stack and see how much has been used.

    Another thing you could do is set a watchpoint (similar to a breakpoint, but this watches data access instead of program fetch) to cover a region over the end of the stack.  Configure the watchpoint to watch data writes.  Then the processor will halt if it attempts to write to that region (meaning you're running out of stack).  In CCS, you set a watchpoint similar to a breakpoint, but choose 'Watchpoint'.

  • Hi Sreekanth,
    Yes, that's correct.

    Thanks,
    Shreyansh