Problem when trying VCU FFT on Concerto M35

Igor Kordunsky

Intellectual 835 points

Other Parts Discussed in Thread: CONTROLSUITE

Hello,

I am trying to port into M35 processor an example code provided for 2837x processor, FFT using VCU.

I have the "latest" CCS 6.0.1.00040 and ControlSuite, and a standard Concerto M35 ControlCard from TI

I started with standard M35' blinky_C28 example from:

C:\ti\controlSUITE\device_support\f28m35x\v204\F28M35x_examples_Control\blinky\c28

and, following instructions in the "C28v-VCU-LIB-UG.pdf", added there code from the example:

C:\ti\controlSUITE\libs\dsp\VCU\v2_00_00_00\examples_ccsv5\fft\2837x_vcu2_rfft_256

With few tweaks of cmd file I managed to compile an example, it runs up to the line 228 in assembler file "vcu2_cfft_128.asm".

228: VCFFT8 VR3, VR2, #1 ;[VR2H:VR2L] := [R2 - R3:R2 + R3] := [VR2L - VR3L:VR2L + VR3L]
;[VR3H:VR3L] := [I2 - I3:I2 + I3] := [VR2H - VR3H:VR2H + VR3H]

When executing this line, C28 processor jumps to the ILLEGAL_ISR(void), line 98 in "F28M35x_SysCtrl.c", it looks like it cannot execute this instruction.

What is wrong? Why C28 coprocessor in M35 cannot execute VCFFT8 instruction?

I tried to put program into RAM or FLASH, it does not help. One POSSIBLE difference between F2837xD and M35 processor: the twiddle factors are in the ROM in the F2837xD processor, I am not sure if they are exist in M35' C28 ROM, so I put them in the RAM.

I exported project to the zip file, here it is:

6371.M35_VCU_FFT.zip

Regards,

Igor

over 9 years ago

0 Vishal_Coelho over 9 years ago

TI__Mastermind 20850 points

Hi Igor,

Concerto has a Type 0 VCU. VCFFT8 is a VCU type 2 instruction and will not work on concerto- when it sees the instruction it will treat is as an illegal opcode and jump to the default handler, which is what you are seeing. The delfino 2837xD(xS) supports type 2.

On concerto you will have to use "c28x_vcu0_library.lib"

0 Igor Kordunsky over 9 years ago in reply to Vishal_Coelho

Intellectual 835 points

Hi, Vishal,

Thanks for the tip,

I successfully created and tested example running on Concerto F28M35 using VCU0, attached as zip file.

You can suggest to add this example to ControlSUITE for reference.

1033.M35_VCU0_FFT.zip

The results of hardware-assisted 16-bit FFT conversion:

// Allowed Error: within error ; outside error

// EPSILON = 0: pass = 276 bins; fail=236 bins.
// EPSILON = 1: pass = 469 bins; fail= 43 bins.
// EPSILON = 2: pass = 503 bins; fail = 9 bins.
// EPSILON = 3: pass = 507 bins; fail = 5 bins.
// EPSILON = 4: pass = 511 bins; fail = 1 bins.
// EPSILON = 5: pass = 512 bins; fail = 0 bins.

Igor

0 Igor Kordunsky over 9 years ago in reply to Igor Kordunsky

Intellectual 835 points

Hi, Vishal,

Now I am trying to use FPU to do FFT, and have a question:

Does Concerto have TMU (Trigonometric Math Unit)?

Thanks,

Igor

0 Vishal_Coelho over 9 years ago in reply to Igor Kordunsky

TI__Mastermind 20850 points

HI Igor,

No, conerto does not - it does have an FPU though and you should be able to use the FPU library's FFT. The delfino 2837x is the first device to have the TMU - the only dsp functions to use the TMU are some of the vector operations and, in future revisions, the magnitude and phase functions/.

0 Igor Kordunsky over 9 years ago in reply to Vishal_Coelho

Intellectual 835 points

Hi, Vishal,

I found out if I put Twiddle Factors table in the RAM, the error number is reduced, for example, if error (EPSILON) = 1,

from fail=43 when TF are in FLASH, to fail=1 when they are in RAM. I use same code:

#pragma DATA_SECTION (fft_TF, ".VCU_TwiddleFact"); //located in RAML3
SINT16 fft_TF[1024];
...................
    fft.init(&fft);
    memcpy(&fft_TF, &CFFT16_TF, sizeof(fft_TF));
    fft.tfptr = fft_TF; //replace twiddle factors pointer to RAM copy

and manually skip fft.tfptr update (or allow to replace default pointer to FLASH with pointer to RAM copy)

Strange, is it?

But I intended to do it anyway to avoid slow reading from FLASH (C28 CLK is 150 MHz, so according to technical guide it needs 3 wait states)

Actually, my goal is to make it as fast as possible, and I want to put FFT calculations in RAM. My real code, where I want to integrate FFT, starts from FLASH.

I know how to put C-functions into "ramfunc" section, but, these are assembler functions, they are assigned to ".text" section, so they are placed in the FLASH.

Question: how to put these assembler functions in the RAM?

Thanks

Igor

0 Vishal_Coelho over 9 years ago in reply to Igor Kordunsky

TI__Mastermind 20850 points

Hi Igor,

Igor Kordunsky said:
Question: how to put these assembler functions in the RAM?

You could do this in the linker command file

1. place the entire .text of the vcu library in the ramfuncs section

ramfuncs {
  -l=c28x_vcu0_library.lib (.text)
   } :LOAD = FLASHD,
      RUN = RAMM0,
      RUN_START(_RamfuncsRunStart),
      LOAD_START(_RamfuncsLoadStart),
      LOAD_SIZE(_RamfuncsLoadSize),
      PAGE = 0

2. place just the FFT function in ramfuncs, you just need to specify which obj file's .text section gets put into ramfuncs

ramfuncs {
  -l=c28x_vcu0_library.lib<vcu0_cfft_256.obj> (.text)
  -l=c28x_vcu0_library.lib<vcu0_cfft_utils.obj> (.text)
   } :LOAD = FLASHD,
      RUN = RAMM0,
      RUN_START(_RamfuncsRunStart),
      LOAD_START(_RamfuncsLoadStart),
      LOAD_SIZE(_RamfuncsLoadSize),
      PAGE = 0

0 Igor Kordunsky over 9 years ago in reply to Vishal_Coelho

Intellectual 835 points

Hi, Vishal,

Thanks, I successfully placed code into RAM, using modified method 2 : created another SECTION for asm code in RAM. See attached zipped project

But, what can you tell about the mystery: the results of calculation depend upon where the twiddle factor table is located, and it is more accurate when it is in RAM:

//----------------+---------------------------------+---------------------------------|
//                |     Twiddle Table in FLASH      |      Twiddle Table in RAM       |
//----------------+---------------------------------+---------------------------------|
// Allowed Error | within error   | outside error | within error   | outside error |
//----------------+-----------------+---------------+-----------------+---------------|
//    EPSILON = 0 | pass = 276 bins | fail=236 bins | pass = 329 bins | fail=183 bins |
//    EPSILON = 1 | pass = 469 bins | fail= 43 bins | pass = 511 bins | fail= 1 bin |
//    EPSILON = 2 | pass = 503 bins | fail = 9 bins | pass = 512 bins | fail = 0 bins |
//    EPSILON = 3 | pass = 507 bins | fail = 5 bins +-----------------+---------------|
//    EPSILON = 4 | pass = 511 bins | fail = 1 bin |
//    EPSILON = 5 | pass = 512 bins | fail = 0 bins |
//----------------+-----------------+---------------+

Regards,

Igor

M35_VCU0_FFT_v2.zip

0 Igor Kordunsky over 9 years ago in reply to Igor Kordunsky

Intellectual 835 points

Hi, Vishal,

I am testing your way:

   ramfuncs         {
                      -l=c28x_vcu0_library_fpu32.lib<vcu0_cfft_256.obj> (.text)
                      -l=c28x_vcu0_library_fpu32.lib<vcu0_cfft_utils.obj> (.text)
                    } : LOAD = FLASHD,
                         RUN = RAMLL0,
                         LOAD_START(_RamfuncsLoadStart),
                         LOAD_SIZE(_RamfuncsLoadSize),
                         RUN_START(_RamfuncsRunStart),
                         PAGE = 0

and it does complain about line marked RED:

"C:..../c28_link.cmd", line 157: error:

expecting output section, GROUP, or UNION instead of "="

Question: What is the correct syntax?

My [additional] section ".asm_text_ram" works:

SECTIONS
{

......
/* next section places whole file[s] with all functions they have into RAM (can be C or ASM file) */
   .asm_text_ram       : LOAD = FLASHD,// place code into FLASH for start, then set different RUN address. ! DO NOT FORGET TO COPY section into RAM !
                         RUN = RAML0, // ! see "main.c": memcpy(&asm_text_ram_runstart, &asm_text_ram_loadstart, (size_t)&asm_text_ram_loadsize);
                         LOAD_START(_asm_text_ram_loadstart),
                         LOAD_SIZE(_asm_text_ram_loadsize),
                         RUN_START(_asm_text_ram_runstart),
                         PAGE = 0
                         {
                             vcu0_cfft_256.obj, // *.obj files are generated from assembler files, these will be loaded into FLASH but linked as if they are in RAM,
                             vcu0_cfft_utils.obj // need to copy! see "main.c": memcpy(&asm_text_ram_runstart, &asm_text_ram_loadstart, (size_t)&asm_text_ram_loadsize);
                         }
Igor

0 Vishal_Coelho over 9 years ago in reply to Igor Kordunsky

TI__Mastermind 20850 points

Igor Kordunsky said:

"C:..../c28_link.cmd", line 157: error:

expecting output section, GROUP, or UNION instead of "="

i got the syntax slightly wrong (colon was at the wrong place), it should be

 ramfuncs:{
           -l=c28x_vcu0_library_fpu32.lib<vcu0_cfft_256.obj> (.text)
           -l=c28x_vcu0_library_fpu32.lib<vcu0_cfft_utils.obj> (.text)
           }  LOAD = FLASHD,
              RUN = RAMLL0,
              LOAD_START(_RamfuncsLoadStart),
              LOAD_SIZE(_RamfuncsLoadSize),
              RUN_START(_RamfuncsRunStart),
              PAGE = 0

you can refer to section 8.5.4.7 of the assembler guide (SPRUG513G) for additional examples. As for the difference in FLASH and RAM results - there should be none. The only reason i can think of a discrepancy is that the twiddle factors are different (corruption or maybe they did not get written).

Also, make sure that the input buffer - the one that gets bitreversed is aligned to a 2N boundary (N = fft size) or this will surely mess up your calculations

0 Igor Kordunsky over 9 years ago in reply to Vishal_Coelho

Intellectual 835 points

Thanks, Vishal,

I suggest you try my example,

As you see from it, I use memcpy() to transfer TF table into RAM.

I double checked image of the twiddle factors in the RAM with FLASH using same method as when comparing the result of FFT, after memcpy() and after FFT is done its job - it is identical. I removed this code from my example after I checked it.

Of cause I aligned buffers as recommended, without it code either crash or produce all "fail" comparison.

The accuracy difference seems to be real.

Could it be wait states of the FLASH messing up the assembler (manually?) optimized parallel operations or pipeline, like performing calculations simultaneously with reading / writing memory?

Igor

Main.c is below for reference

#include "DSP28x_Project.h" // Examples Include File

#include <string.h>

#define F28_DATA_TYPES //prevents warnings about already defined types, like Int32

#include "vcu0_types.h"

#include "vcu0_fft.h"

//!

//! \addtogroup F28M35 VCU0_RIFFT_EXAMPLES Real Fast Fourier Transform (N = 256) Example

// @{

//*****************************************************************************

// defines

//*****************************************************************************

#define DATA_LENGTH (256<<1)

#define EPSILON 1 //allowed error vs precalculated test output

#define RAM_TFtbl //switch to put Twiddle Factors table in RAM

//*****************************************************************************

// globals

//*****************************************************************************

cfft16_t fft = rifft16_256P_DEFAULTS;

//there variables are automatically generated by linker, see "F28M35x_VCU0_C28_FLASH.cmd", SECTION .asm_text_ram

extern unsigned int asm_text_ram_loadstart;

extern unsigned int asm_text_ram_loadsize;

extern unsigned int asm_text_ram_runstart;

#ifdef __cplusplus

#pragma DATA_SECTION (".shadow");

#else

#pragma DATA_SECTION (fft_work_buffer, ".shadow");

#endif //__cplusplus

ComplexShort fft_work_buffer[256];

#ifdef __cplusplus

#pragma DATA_SECTION (".FFTinput");

#else

#pragma DATA_SECTION (fft_input, ".FFTinput");

#endif //__cplusplus

SINT16 fft_input[256];

#ifdef __cplusplus

#pragma DATA_SECTION (".VCU_TwiddleFact");

#else

#pragma DATA_SECTION (fft_TF, ".VCU_TwiddleFact");

#endif //__cplusplus

SINT16 fft_TF[1024];

//Global Data

UINT32 err=0;

SINT16 *dataIn_p, *dataOut_p;

Uint16 pass = 0;

Uint16 fail = 0;

//*****************************************************************************

// Function Prototypes

//*****************************************************************************

#pragma CODE_SECTION(rifft16, "ramfuncs");

void rifft16(cfft16_t *fft_hnd);

//*****************************************************************************

// function definitions

//*****************************************************************************

//!

//! \brief main routine for the 256-sample RFFT example

//! \return returns a 1

//!

//! This example shows how to use the vcu0 supported CFFT routines from the

//! library to perform a Real Inverse FFT. The input is placed in the .econst section

//! and needs to be aligned to the size of the input in words to allow the bit reverse

//! addressing in stage 1 of the FFT to work properly. The output, however, need

//! not be aligned to any boundary

//!

void main(void)

{

unsigned long delay, k;

// Step 1. Initialize System Control:

// PLL, WatchDog, enable Peripheral Clocks

// This example function is found in the F28M35x_SysCtrl.c file.

InitSysCtrl();

// Step 2. Initialize GPIO:

// This example function is found in the F28M35x_Gpio.c file and

// illustrates how to set the GPIO to it's default state.

InitGpio(); // Skipped for this example

EALLOW;

LED_0_DIR_REG = 1;

EDIS;

LED_0_DAT_REG = 1;// turn off LED

// Step 3. Clear all interrupts and initialize PIE vector table:

// Disable CPU interrupts

DINT;

#ifdef _FLASH

// Copy time critical code and Flash setup code to RAM

// This includes the following functions: InitFlash();

// The RamfuncsLoadStart, RamfuncsLoadSize, and RamfuncsRunStart

// symbols are created by the linker. Refer to the device .cmd file.

memcpy(&RamfuncsRunStart, &RamfuncsLoadStart, (size_t)&RamfuncsLoadSize);

// Call Flash Initialization to setup flash waitstates

// This function must reside in RAM

InitFlash();

#endif

// Initialize the PIE control registers to their default state.

// The default state is all PIE interrupts disabled and flags

// are cleared.

// This function is found in the F28M35x_PieCtrl.c file.

InitPieCtrl();

// Disable CPU interrupts and clear all CPU interrupt flags:

IER = 0x0000;

IFR = 0x0000;

// Initialize the PIE vector table with pointers to the shell Interrupt

// Service Routines (ISR).

// This will populate the entire table, even if the interrupt

// is not used in this example. This is useful for debug purposes.

// The shell ISR routines are found in F28M35x_DefaultIsr.c.

// This function is found in F28M35x_PieVect.c.

InitPieVectTable();

// Enable global Interrupts and higher priority real-time debug events:

EINT; // Enable Global interrupt INTM

ERTM; // Enable Global realtime interrupt DBGM

//*************************************************************************

// Running the FFT

//*************************************************************************

//! \b Running \b the \b FFT

//! The user declares the cfft16_t Object with the appropriate

//! init() and run() (often specified in the DEFAULT macros provided in the

//! header file). The init() will load the twiddle factor table pointer

//! to the structure. For the F28X7x devices, the twiddle factor tables are

//! present in bootrom and may be used instead of the table provided with

//! the library source code

//! \code

//*************************************************************************

// Step 0: Copy input buffer into RAM

memcpy(&asm_text_ram_runstart, &asm_text_ram_loadstart, (size_t)&asm_text_ram_loadsize);

memcpy(&fft_input, &RIFFT16_256p_in_data, (size_t)(DATA_LENGTH>>1));

// Step 1: Initialize CFFT object

fft.ipcbptr = (int *)fft_input + 0; //input buffer pointer

fft.workptr = (int *)fft_work_buffer + 0; //work and ouptput buffer pointer

fft.init(&fft);

#ifdef RAM_TFtbl

memcpy(&fft_TF, &CFFT16_TF, sizeof(fft_TF)); //sizeof is 1024, and we need whole table

fft.tfptr = fft_TF; //replace twddle factors pointer to RAM copy

#endif

// Step 2: Pack the input; bit reverse it then run the FFT. Finally,

// flip the real and imaginary parts of the output to obtain the

// real inverse FFT

rifft16(&fft);

//*************************************************************************

//!

//! \endcode

//!

//*************************************************************************

// Step 3: Verify the result

for (k=0; k<DATA_LENGTH; k++) {

if (abs(RIFFT16_256p_out_data[k] - fft_input[k]) > EPSILON){

fail++;

}else{

pass++;

}

/**************************************************************************************

// Results of F28M35 C28 coprocessor:

//----------------+---------------------------------+---------------------------------|

// | Twiddle Table in FLASH | Twiddle Table in RAM |

//----------------+---------------------------------+---------------------------------|

//----------------+-----------------+---------------+-----------------+---------------|

// EPSILON = 3 | pass = 507 bins | fail = 5 bins +-----------------+---------------|

// EPSILON = 4 | pass = 511 bins | fail = 1 bin |

// EPSILON = 5 | pass = 512 bins | fail = 0 bins |

//----------------+-----------------+---------------+

// The failed bins have same sign of result, but the value differs more than EPSILON

***************************************************************************************/

// Step 6. IDLE loop. IF PROGRAM POINTER IS HERE, IT DID NOT CRASH. Just sit and loop forever (optional):

for(;;)

{

LED_0_DAT_REG = 0; // Turn on LED

// Delay for a bit.

for(delay = 0; delay < 2000000; delay++)

{

}

LED_0_DAT_REG = 1; // Turn off LED

// Delay for a bit.

for(delay = 0; delay < 2000000; delay++)

{

}

// End of main

0 Igor Kordunsky over 9 years ago in reply to Igor Kordunsky

Intellectual 835 points

Hi, Vishal,

Did you have a chance to try my example?

I stand for my observation: VCU FFT is more precise when twiddle factors are in RAM than in FLASH, which should not be as you correctly pointed out.

Either library function for VCU0 FFT has a bug or me is wrong and / or missing something

Igor

0 Vishal_Coelho over 9 years ago in reply to Igor Kordunsky

TI__Mastermind 20850 points

Hi Igor,

Sorry i didnt have the chance to take a look at it today. I will, in the morning, and let you know what i find.

0 Igor Kordunsky over 9 years ago in reply to Vishal_Coelho

Intellectual 835 points

Hi, Vishal,

I discovered some more weirdness:

I am re-arranging RAM usage for my real application, and decided to move .FFTinput from RAML3

into RAMM1, PAGE=1, ALIGN=256,

Well, program crashes after cifft16_pack_asm(fft_hnd) call, which corrupts fft structure. I suspected something else is interfering or this function writes something beyond allowed buffer. The only other section in the code placed in the RAMM1 is ".ebss"

I moved .ebss into RAMM0 and program runs again.

to summarize, this crashes program (corrupts fft structure):

.stack : > RAMM0 PAGE = 1
.ebss : > RAMM1 PAGE = 1 //both .FFTinput and .ebss in the same block, .FFTinput is at the beginning of RAMM1, .ebss after

.FFTinput : > RAMM1, ALIGN = 256, PAGE = 1
.shadow : > RAML3, ALIGN = 256, PAGE = 1

This works:

.stack : > RAMM0 PAGE = 1
.ebss : > RAMM0 PAGE = 1

.FFTinput : > RAMM1, ALIGN = 256, PAGE = 1 //start address 0x400
.shadow : > RAML3, ALIGN = 256, PAGE = 1

Looking on memory, I see that cfft16_brev() is writing up to 0x5FF address, so it writes (illegally?) beyond fft_input[256] taking (illegally?) another 256 words.

Does it suppose to write to ComplexShort fft_work_buffer[256], which has sizeof=512? (and maybe after copy first 256 back to fft_input[256]?)

0 Vishal_Coelho over 9 years ago in reply to Igor Kordunsky

TI__Mastermind 20850 points

Hi Igor,

I tried your code, but im not seeing any difference in the pass/fail numbers when i switch between RAM and FLASH

I tried EPSILON = 1,2,3 with RAM_TFtbl commented and uncommented and i am seeing the exact same result in all 3 cases for flash/ram.

I am, however, using CGT 6.2.6 as i think there were issues with the pointer arithmetic on cgt 6.2.9(i might be wrong) so could you try swtiching to either the latest CGT 6.4.0 or trying with 6.2.6 and see if that helps. I cant see anything wrong with your code and am unable to reproduce the issue on my end.

0 Vishal_Coelho over 9 years ago in reply to Igor Kordunsky

TI__Mastermind 20850 points

Ah, i see the issue now. the fft_input size should be 512 not 256, the pack routine will start "packing" from the top and bottom of the array - hence you were seeing the overwrite at 0x5FF. What was happening is the fft structure was being corrupted in the packing process - it corrupted the pointer to the calc() and when it tries to call it, you get an illegal opcode trap and it goes into the illegal_code isr

this was happening because the input buffer was halfsize and the structure(.ebss) was in the computation space of the pack routine.

0 Igor Kordunsky over 9 years ago in reply to Vishal_Coelho

Intellectual 835 points

Vishal,

I dig deeper and found out that my observation of "mysterious" difference between TF table in RAM vs FLASH was due to:

a) wrongly assigned sizes and alignments in main.c and *.cmd files

b) placing TF table in the RAM, in the same segment (RAML3) as .FFinput and .shadow was forcing linker to put the larges piece (fft_tf) first, at 0xB000, then smaller piece (fft_work_buffer[]) at 0xB400, then smallest piece (fft_input) at 0xB600. Because nothing more is put into RAML3, fft_input[256] buffer got overwritten without visible effect, and calculation was correct (pass=511, fail=1).

NOTE: for better control from one place (main.c) instead of two places (main.c and ALIGN directive in *.cmd files), I put alignment into C-code using: #pragma DATA_ALIGN ( fft_input, 256); //MUST BE ALIGNED

//// this SEEMS TO BE WORKING even the assigned sizes and alignments are WRONG

.FFTinput : > RAML3, PAGE = 1 // ALIGN = 256, //place for fft_input[256]
.shadow : > RAML3, PAGE = 1 // ALIGN = 256, //place for fft_work_buffer[256]
/* THIS placement SPEEDS UP calculations (reading const from FLASH needs wait states, from RAM - without wait) */
.VCU_TwiddleFact : > RAML3, PAGE = 1 // need to copy Twiddle Factors table from FLASH to RAM, see "main.c", memcpy(&fft_TF, &CFFT16_TF, sizeof(fft_TF));

When TF table was left in FLASH of put in the different segment, RAMM1, linker put into RAML3 first .shadow SECTION (ComplexShort fft_work_buffer[256]) at 0xB000, then smaller piece .FFTinput SECTION (Int16 fft_input[256]) at 0xB200, and result was incorrect (pass=469, fail=43) due to wrong alignment.

//// this is WRONG DUE TO MISALIGNMENT (and wrong size buffer)
.FFTinput : > RAML3, PAGE = 1 // ALIGN = 256, //place for Int16 fft_input[256]
.shadow : > RAML3, PAGE = 1 // ALIGN = 256, //place for ComplexShort fft_work_buffer[256]
.VCU_TwiddleFact : > RAMM1, PAGE = 1 //place for Int16 fft_TF[1024]

So, the proper buffers and alignments should be:

#define VCU0_FFT_STAGES 8
#define VCU0_FFT_SIZE (1<<VCU0_FFT_STAGES) // 256 Int16 words
#define DATA_LENGTH (VCU0_FFT_SIZE<<1) // 512 Int16 words

#ifdef __cplusplus
#pragma DATA_SECTION (".shadow");
#else
#pragma DATA_SECTION (fft_work_buffer, ".shadow");
#endif //__cplusplus
ComplexShort fft_work_buffer[VCU0_FFT_SIZE];
#pragma DATA_ALIGN ( fft_work_buffer, VCU0_FFT_SIZE<<1); // MUST BE ALIGNED

#ifdef __cplusplus
#pragma DATA_SECTION (".FFTinput");
#else
#pragma DATA_SECTION (fft_input, ".FFTinput");
#endif //__cplusplus
SINT16 fft_input[VCU0_FFT_SIZE<<1];
#pragma DATA_ALIGN ( fft_input, VCU0_FFT_SIZE<<1); //MUST BE ALIGNED

See attached main.c and *.cmd

3617.VCU0_FFT_blinky_v3.zip

C2000™︎ microcontrollers

C2000 microcontrollers forum

Problem when trying VCU FFT on Concerto M35