TMS320C28346: RFFT32 does not work with 32768 points

Dmitry Ischuk

Part Number: TMS320C28346
Other Parts Discussed in Thread: CONTROLSUITE, C2000WARE,

Hello all!

I can't make RFFT32u (and alignment-requiring RFFT32) function work with 32768 input points (15 stages). CPU can't left outer loop when I look in .asm sources of library during debug.

16384 points (14 stages) works well.

Memory: all 3 buffers - inbuf, outbuf and cossinbuf are located in joined memory blocks RAMH01, RAMH23 and RAMH45.

Stack: 1920 words in joined RAMM01.

Heap (.esysmem): 2048 words in joined RAML07

Library: tried different versions from 1.31 to 1.50.

Compiler: tried different versions, 18.1.3 was recently used.

The project itself is non-TI-RTOS and rather simple, about 30 kWords of memory is still available.

Hope for some solution to be found.

Dmitry.

over 6 years ago

0 Sira Rao80 over 6 years ago

TI__Mastermind 26110 points

Dmitry,

I will work with you to try to resolve this issue.

A few clarifying questions:

1. "CPU can't left outer loop when I look in .asm sources of library during debug." Do you mean the CPU is stuck in the outer loop?
2. "Library: tried different versions from 1.31 to 1.50." I presume this is the FPU FTT you are working with? Are you using C2000Ware or ControlSUITE?
3. Are you running a TI example project or have you integrated the TI library into your separate project which you are trying to run? From your statement "the project itself is non-TI-RTOS", it indicates to me the answer to this is the latter, but I wanted to confirm.

Thanks,
Sira

0 Sira Rao80 over 6 years ago in reply to Sira Rao80

TI__Mastermind 26110 points

Dmitry,

Are you running into a memory limitation issue?

typedef struct {
float *InBuf; //!< Pointer to the input buffer
float *OutBuf; //!< Pointer to the output buffer
float *CosSinBuf; //!< Pointer to the twiddle factors
float *MagBuf; //!< Pointer to the magnitude buffer
float *PhaseBuf; //!< Pointer to the phase buffer
uint16_t FFTSize; //!< Size of the FFT (number of real data points)
uint16_t FFTStages; //!< Number of FFT stages
} RFFT_F32_STRUCT;

This structure, where float is a single-precision (32-bit i.e. 4 byte) value, indicates you need 4bytes x 16384 = 64KB for each of InBuf, OutBuf, CosSinBuf, MagBuf, and PhaseBuf.

With a 32678 point FFT, you would need double that overall requirement.

Thanks,
Sira

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Documentation tells us that RFFT32(u) function does not use Magbuf and PhaseBuf because magnitude and phase are calculated later by different functions. For 16384 points that fields were set to point to non-existent memory areas and RFFT worked well.

For 3 buffers (InBuf, OutBuf and CosSinBuf) 32k (number of points) x 2 (size of float32 in words) x 3 (number of buffers) = 192 kwords - exactly the capacity of RAMH0..5 memory blocks, divided into 3 areas for those buffers.

Stack and heap memory requirements are not clear.

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Hello,

1. Sorry for typo - CPU can't leave the loop. When I pause the debug I can see that CPU runs forever in _rfft_f32_Inner_Loop cycle of the _rfft_f32_OuterLoop part of RFFT_f32.asm source file.
2. I use FPU DSP Software library provided with ControlSuite. Versions 1.31, 1.40 and 1.50 were tried with the same result.
3. Function call was made from a custom project, not an example project. More than half of RAML0...7 block is free, stack and heap were set to 2 kwords.

0 Sira Rao80 over 6 years ago in reply to Dmitry Ischuk

TI__Mastermind 26110 points

Dmitry,

Thanks for the update. Good catch on the documentation note on MagBuf and PhaseBuf. Your calculation seems correct.

I am still in the process of confirming this, but it might be that each buffer cannot exceed a 64khalfword size; the algorithm uses register indirect addressing XARn[ARm] – the base address of 22-bits goes in XARn, but the offset (16-bits) goes in Arm. So with a 16-bit offset, that would mean 64k bytes i.e. 32k words, which is consistent with the 16384 point FFT working, but not the 32768 point FFT.

While I confirm this, it would be good to check on the following:

1. I am assuming when you mean "RFFT worked well for 16384 points" you actually verified this by plotting the OutBuf (or some such technique).
2. How are you specifying what is in InBuf?
3. I presume the linker command file stays the same for both the 16384 and 32768 point FFT. Have you compared the generated .map files to see if there's anything amiss? For example, are InBuf, OutBuf, and CosSinBuf assigned to RAMHx locations as you expect, and to the right addresses so that they exactly fit in the 192k words space? Are MagBuf and PhaseBuf assigned to null pointers?
4. If there is no dynamic memory allocation in your project (there isn't in the FFT library), the Heap requirement would be 0. You can verify this by looking for the .esysmem section in the linker command file, and then look for it in the .map file. In the TI example project, it is present in the .cmd file but not in the .map file, indicating no heap is needed.
5. You can look at the stack requirement by inspecting the .map file and checking the length.
6. This is for my own curiosity - I am curious to know why you need such large size FFTs. What's the application

Thanks,
Sira

0 Sira Rao80 over 6 years ago in reply to Sira Rao80

TI__Mastermind 26110 points

Dmitry,

Updates:
1. On the addressing issue I mentioned in my previous post, 16-bit offset would mean 64k words (not 64k bytes), since each addressable location in the device is a word. So, in theory, yes, 32768 point RFFT should be possible.
2. However, we should limit our focus to rfft_f32u i.e. the unaligned RFFT function, so that the buffer does not have to be aligned to a 2 x FFTSize boundary.

Thanks,
Sira

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Hello Sira,

1. The result was saved to the file on PC by RTS library functions (fopen and fprintf), then plotted.

2. For 16384 points it was got from file, for 32768 it was generated as a sum of 3 harmonics to avoid any issues with RTS functions and stack/heap leakage.

3. Now I don't have hand on the .cmd and .map files, but memory was organized this way:

> 3 blocks of memory were defined: RAMH01 , RAMH23 and RAMH45, starting at 0x300000, 0x310000 and 0x320000, each one 0x10000 words long.

> 3 sections were placed in those areas: .inbuf, .outbuf, .cossinbuf

> 3 arrays of float32 were declared with #pragma DATA_SECTION(f32InBuf, ".inbuf) \\ float32 f32InBuf[32768]; etc.

> RFFT data structure was filled with pointers to those arrays (size and stages were set to 32768 and 15).

.map file and memory usage in CCS showed that everything was located in its place.

For 16384 points everything was the same, but sections of memory obviously were only half-filled.

Btw: bug in CCS was found: when you want to see the array of data from some pointer in expressions area, choosing that pointer and then clicking ...display as array, CCS shows 16384 records of float32 correct and then jumps back on 64k words of memory.

Magbuf and Phasebuf pointers were set to 0x340000. NULL pointer seems rather dangerous for me - RAMM01 block exists and begins from 0x0 address.

4 and 5. No dynamic memory allocations were used. Even RTS functions were disabled for testing. Heap and stack problems are always hard to detect, so both of them were increased to some large enough values.

6. The application is to detect some narrow-band signal in data array from ADC, so such large FFT is needed. Thinning or dividing the input data will decrease maximum frequency of the specter.

7. rfft_f32u function was used (aligned variant gave the same result).

Dmitry.

0 Sira Rao80 over 6 years ago in reply to Dmitry Ischuk

TI__Mastermind 26110 points

Dmitry,

Thanks for the feedback.

1. "Magbuf and Phasebuf pointers were set to 0x340000" - was this by the linker or by you? I ask because when I look at the memory map of the C28346, 0x340000 falls in Reserved space.

2. "Btw: bug in CCS was found: when you want to see the array of data from some pointer in expressions area, choosing that pointer and then clicking ...display as array, CCS shows 16384 records of float32 correct and then jumps back on 64k words of memory."

CCS bugs, especially in newer versions, are quite possible. Does the above mean that for a 32768 point FFT, the Expressions view displays InBuf correctly for the first 16384 values, and then displays incorrect values? Did you verify from View-Memory that the InBuf actually contains the desired 32768 values?

3. When you run the code, how does the OutBuf change? Presuming you have a fixed InBuf, then the OutBuf should also be fixed, so perhaps you can try "corrupting" the OutBuf manually, then run the code and see if it changes at all or does not.

4. For the 32768 case, can you send me the .cmd and .map files? I'd like to take a look.

5. One more thing I'd like you try is the 16384 point CFFT (since its memory requirements should match the 32768 point RFFT). Is this something you can try?

Thanks,
Sira

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Hello Sira,

1. Those pointers are set to non-existent memory to avoid any memory corruption if Mag and Phase buffers are used in some non-documented way. Reads and writes to non-existent memory do nothing. But they are not used at all. Success of 16384 points with the same pointers (to 0x340000) proves it.

2. The bug shows itself when you select some pointer and choose "Display as array...". Viewing the original array and using memory browser shows that everything is ok. Just on more annoying bug, does not affect the FFT in any way.

3. The OutBuf changes continuously while CPU runs the loop.

4. The most important parts of these important files:

.cmd:

MEMORY
{
PAGE 0 :
   BEGIN        : origin = 0x000000, length = 0x000002     /* Boot to M0 will go here                      */
   RAMM01       : origin = 0x000052, length = 0x0007AE
   RAML07       : origin = 0x008000, length = 0x010000
   RESET        : origin = 0x3FFFC0, length = 0x000002

PAGE 1 :
   BOOT_RSVD    : origin = 0x000002, length = 0x000050     /* Part of M0, BOOT rom will use this for stack */
   RAMH05       : origin = 0x300000, length = 0x030000
}
...

   rfft_in          : > RAMH05,    PAGE = 1
   rfft_out         : > RAMH05,    PAGE = 1
   rfft_coef        : > RAMH05,    PAGE = 1

.map:

rfft_coef   1    00300000    00010000     UNINITIALIZED
                  00300000    00010000     DSP346_FFT.obj (rfft_coef)

rfft_in    1    00310000    00010000     UNINITIALIZED
                  00310000    00010000     DSP346_FFT.obj (rfft_in)

rfft_out   1    00320000    00010000     UNINITIALIZED
                  00320000    00010000     DSP346_FFT.obj (rfft_out).

The InBuf, OutBuf and CosSinBuf are placed in their memory sections by #pragma.

5. CFFT for 16384 points works correctly. Checked by Octave.

Thanks,

Dmitry.

0 Sira Rao80 over 6 years ago in reply to Dmitry Ischuk

TI__Mastermind 26110 points

Dmitry,

Thanks for the feedback. I think the next step is for me to try this at my end. I will have to locate the HW for this (can you let me know what Hardware you are using?). OR I should use a board whose memory will allow/just allow the 32768 point FFT to fit in RAM. Please give me until Friday or next Monday to get back to you with an update. I am going to be in training for a good portion of this week.

Thanks,
Sira

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Hello Sira,

We use self-made board with TMS320C28346. The CPU runs without any hardware problems.

I think that C28346 168-pin control card with dock board will fit well for the experiment. Only internal RAM is needed.

Thanks, Dmitry.

0 Sira Rao80 over 6 years ago in reply to Dmitry Ischuk

TI__Mastermind 26110 points

Hi Dmitry,

I was able to run the test and reproduce the problem. As you say, it works for the 16384 pt unaligned RFFT, but gets stuck at the location you mention for the 32768 pt unaligned RFFT. I also notice the SinCosTable has some value for SinCosTable[0] and all other values are +Inf.

I need to further debugging. I will continue looking into this tomorrow.

Thanks,
Sira

0 Sira Rao80 over 6 years ago in reply to Sira Rao80

TI__Mastermind 26110 points

Hi Dmitry,

On further debugging, I notice that when I choose 32768 as the FFT_SIZE, the last 4 values of SinCosBuf are Inf (all other values are OK), even with the function call to RFFT_f32u() commented out.

I compared the .map files for the 16384 and 32768 builds, and there weren't any significant differences.

If we can get to the bottom of the SinCosBuf issue, it may lead to some vital clues. I would suggest you taking a stab at this.

Thanks,
Sira

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Hello Sira,

For one simple harmonic SinCos values seem to be correct. May be it strongly depends on InBuf values.

Thanks,

Dmitry.

0 Sira Rao80 over 6 years ago in reply to Dmitry Ischuk

TI__Mastermind 26110 points

Hi Dmitry,

How we move forward from here needs to be determined. I have internally filed a JIRA issue here at TI to trace this issue and get a root cause. From that standpoint, I would like to mark this issue as Resolved, since at this point, I am unable to devote more time to debugging this issue.

How are you placed? Is this issue a road block for you? Can your application make-do with a 16384 pt FFT? Please let me know your thoughts.

Thanks,
Sira

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Hello Sira,

At this point we move to using self-written FFT in C.

I think that the main question (possibility of 32768-point-FFT by DSP FPU library) is not answered. If it IS NOT possible - it should be documented to save users' time. If it is a small bug that can be quickly corrected - it would be very nice to use platform-optimized function to save CPU time and see in the documentation that max value for FFT size is 32768.

So, if your answer is "32768 is not possible at all and documentation will be corrected" I'll mark the problem as resolved.

Thanks,

Dmitry.

0 Sira Rao80 over 6 years ago in reply to Dmitry Ischuk

TI__Mastermind 26110 points

Hi Dmitry,

You raise a good point, and indeed this is also a JIRA issue I filed. In fact, the user's guide at present contains incorrect information - it states that the FFT sizes permitted are 2^5 through 2^10, which is not correct. This range only reflects the values for which pre-computed twiddle factor tables are available. As we have seen, FFT sizes through 2^14 work properly, however 2^15 does not work properly, and I have indicated in the JIRA ticket that this information needs to be present in the user's guide.

So, yes, we have one ticket that will attempt to address the issue with the 32768 point FFT, and another ticket that will address user guide documentation updates.

Please go ahead and mark the issue as resolved.

Thanks,
Sira

0 Dmitry Ischuk over 6 years ago in reply to Sira Rao80

Prodigy 90 points

Hello Sira,

Ok.

Thanks,

Dmitry.

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320C28346: RFFT32 does not work with 32768 points