This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28379D: Moving FFT from CPU1 to CLA (bad computation)

Part Number: TMS320F28379D

Hello all, I had a FFT calculation done in cpu1 with 2048 samples in cooperation with DMA, I obtained samples from the ADC into a buffer, from which I did calculations. Now I wanted to move the FFT to the CLA (to save time), but it doesn't work properly for me. In CLA I work with 1024 samples, I copy filled buffers from DMA to IOBuffer and calculate FFT in CLA. After the calculation I get the ISR, where I calculate the magnitudes, from which I then make other calculations (rms, energy, phase shifts, etc ...). And here I see mistakes.... I use ePWM for ADC trigger to calculate the exact number of required samples (now 1024) (ePWM TBCLK = 100MHz; CLKDIV = 0; TBPRD = 1953; CMPA = 977;) for 50Hz(20ms) sine signal.

RFFTmag must also be in the LS ram? Because I don't have enough space so I placed it in GSram.

My Linker: 2837xD_FLASH_lnk_cpu1.rar


My CLA code:

/*--------------------------------------------*/
//FFT CLA
/*--------------------------------------------*/
RFFT_F32_STRUCT rfft; // The FFT object

#pragma DATA_SECTION(IOBuffer,"IOBuffer");   //Buffer alignment for the input array,
float32 IOBuffer[RFFT_SIZE];                   //RFFT_f32u(optional), RFFT_f32(required)
                                             //Output of FFT overwrites input if
                                            //RFFT_STAGES is ODD
#pragma DATA_SECTION(IOBuffer2,"IOBuffer");
float32 IOBuffer2[RFFT_SIZE];                  //Output of FFT here if RFFT_STAGES is EVEN

#pragma DATA_SECTION(RFFTmagBuff,"RFFTmag");
float32 RFFTmagBuff[RFFT_SIZE/2+1];            //Additional Buffer used in Magnitude calc

#pragma DATA_SECTION(RFFTF32Coef,"RFFTtwiddles");
float32 RFFTF32Coef[512];                 //Twiddle buffer

void init_Cla(void)
{
    extern uint32_t Cla1funcsRunStart, Cla1funcsLoadStart, Cla1funcsLoadSize;
    extern uint32_t Cla1ConstRunStart, Cla1ConstLoadStart, Cla1ConstLoadSize;

    EALLOW;

#ifdef _FLASH
    //CLA MEMORY INIT
    memcpy((uint32_t *)&Cla1funcsRunStart, (uint32_t *)&Cla1funcsLoadStart, (uint32_t)&Cla1funcsLoadSize );
    memcpy((uint32_t *)&Cla1ConstRunStart, (uint32_t *)&Cla1ConstLoadStart, (uint32_t)&Cla1ConstLoadSize );
#endif

    // Initialize and wait for CLA1ToCPUMsgRAM
    MemCfgRegs.MSGxINIT.bit.INIT_CLA1TOCPU = 1;
    while(MemCfgRegs.MSGxINITDONE.bit.INITDONE_CLA1TOCPU != 1){};

    // Initialize and wait for CPUToCLA1MsgRAM
    MemCfgRegs.MSGxINIT.bit.INIT_CPUTOCLA1 = 1;
    while(MemCfgRegs.MSGxINITDONE.bit.INITDONE_CPUTOCLA1 != 1){};

    //LS RAM CONTROL pre CLA

    //PROGRAM SPACE
    MemCfgRegs.LSxMSEL.bit.MSEL_LS2 = 1;
    MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS2 = 1;

    MemCfgRegs.LSxMSEL.bit.MSEL_LS3 = 1;
    MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS3 = 1;

    //DATA SPACE
    MemCfgRegs.LSxMSEL.bit.MSEL_LS0 = 1;
    MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS0 = 0;

    MemCfgRegs.LSxMSEL.bit.MSEL_LS1 = 1;
    MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS1 = 0;

    MemCfgRegs.LSxMSEL.bit.MSEL_LS4 = 1;
    MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS4 = 0;

    MemCfgRegs.LSxMSEL.bit.MSEL_LS5 = 1;
    MemCfgRegs.LSxCLAPGM.bit.CLAPGM_LS5 = 0;

    EDIS;

    //TASKS
    EALLOW;
    Cla1Regs.MVECT1 = (uint16_t)(&Cla1Task1);
//  Cla1Regs.MVECT2 = (uint16_t)(&Cla1Task2);
//  Cla1Regs.MVECT3 = (uint16_t)(&Cla1Task3);
//  Cla1Regs.MVECT4 = (uint16_t)(&Cla1Task4);
//  Cla1Regs.MVECT5 = (uint16_t)(&Cla1Task5);
//  Cla1Regs.MVECT6 = (uint16_t)(&Cla1Task6);
//  Cla1Regs.MVECT7 = (uint16_t)(&Cla1Task7);
    Cla1Regs.MVECT8 = (uint16_t)(&Cla1Task8);


    //Trigger set
    DmaClaSrcSelRegs.CLA1TASKSRCSEL1.bit.TASK1  = CLA_TRIG_NOPERPH;
    DmaClaSrcSelRegs.CLA1TASKSRCSEL2.bit.TASK8  = CLA_TRIG_NOPERPH;
    Cla1Regs.MIER.all                           = (M_INT1 | M_INT8);

    //Sw task force enable
    Cla1Regs.MCTL.bit.IACKE = 1;

    //Cla ISR
    PieVectTable.CLA1_1_INT          = &cla1Isr1;
    PieCtrlRegs.PIEIER11.bit.INTx1   = 0x01;
    IER |= (M_INT11);
    EDIS;

}

void init_ClaFFT(void)
{
    memset(&IOBuffer, 0, sizeof(IOBuffer));
    memset(&IOBuffer2, 0, sizeof(IOBuffer2));
    memset(&RFFTmagBuff, 0, sizeof(RFFTmagBuff));

    rfft.FFTSize   = RFFT_SIZE; //(1 << RFFT_STAGES)
    rfft.FFTStages = RFFT_STAGES; //(10U)
    rfft.InBuf     = &IOBuffer[0];     //Input buffer
    rfft.OutBuf    = &IOBuffer2[0];    //Output buffer
    rfft.CosSinBuf = &RFFTF32Coef[0];  //Twiddle factor buffer
    rfft.MagBuf    = &RFFTmagBuff[0];  //Magnitude buffer

    RFFT_f32_sincostable(&rfft);       //Calculate twiddle factor
}

CLA:

__interrupt void Cla1Task1 ( void )
{
   //__mdebugstop();
   CLA_CFFT_run512Pt();
   CLA_CFFT_unpack512Pt();
}

__interrupt void cla1Isr1 ()
{
    if(CLASignalType == 0)
    {
        //Volt struct
        FFT_Computation_Cla(&volt1, rfft.OutBuf); //Harmonic from rfft.OutBuf
    }
    else
    {
        if(CLASignalType == 1)
        {
            //Curr struct
            FFT_Computation_Cla(&curr1, rfft.OutBuf);
        }
    }

    ClaComputationDone = true;
    PieCtrlRegs.PIEACK.all = M_INT11;
}

For comparing CLA and old CPU calculations (which were good) graph rfft.MagBuf, same sine signal 50Hz, CLA 1024 samples fft, CPU 2048 samples fft.



Thanks for advices, Marek.

  • Hello Marek,

    The RFFTmag must be placed in LSx memory if the CLA is going to access it. See section 6.3.1 of the TMS320F28379D Technical reference manual.

    https://www.ti.com/lit/ug/spruhm8i/spruhm8i.pdf

    Also, the following post has more information including links to resources that may help:

    https://e2e.ti.com/support/microcontrollers/c2000/f/c2000-microcontrollers-forum/786227/faq-cla-frequently-asked-questions

    Thanks,

    Ashwini

  • I tried it, but this is even worse. I calculate RFFTMag in ISR which is in CPU1. In FFT_Computation_Cla(); function with command RFFT_f32_mag_TMU0(&rfft);

    I dont have space in LSrams, im trying to cut half of LS (eg. LS 3) and give to another rams (eg. LS4_5 in my linker) but this affect whole LS's and CLA program, because they are not aligned and I tell register that LS3 is for program memory not data (i cannot tell that half LS3 is for program, half for data).

    Can some expert check my linker, if is everything ok ?

  • Hi Marek,

    LSx blocks cannot be split, they must either be assigned as data or as program. Also, all global variables that will be used on the CLA side must be mapped either to the LSx block designated as data or the CPU-CLA Message RAMs.

    Thanks,
    Ashwini

  • Hello, but for me is not possible to move all buffers to LS... How ?

    I dont even have the full range (1024) of twiddles factors there. Im clueless, I dont know what to do with it ... 

    My memory allocation:





    In example from Vishal Coelho (CLA_HandsOnWorkshop) TwiddlesFactors are placed in GSram. Make non sense for me..

  • Hello Marek,

    I went and looked at the CLAHandsOn Solution for CLA_CODE configuration. In that implementation the 2 IOBuffers are placed in LS RAM. I believe this is the only shared FFT datastructure used on the CLA side.  The RFFTmag and RFFTtwiddles are being placed in GSRAM because only the C28 side of the code needs these when executing  the RFFT magnitude in  main. Hence the partitioning makes sense. You can follow the same if this is what your implementation is doing.

    One question regarding the code snippet you shared for the following two lines. How are Task 1 and Task 8 being triggered? Have you confirmed that the CLA1 task is being run?

    //Trigger set
        DmaClaSrcSelRegs.CLA1TASKSRCSEL1.bit.TASK1  = CLA_TRIG_NOPERPH;
        DmaClaSrcSelRegs.CLA1TASKSRCSEL2.bit.TASK8  = CLA_TRIG_NOPERPH;

    Thanks,

    Ashwini

  • Hello, thanks for answer. Im using DMA with ping pong buffering, so after I fill 1024 samples to local buffer (eg. InBuf_Volt1) i copy values to IOBuffer and inc Task1. After task1 complete FFT, I got CLA ISR where program calculating magnitudes and phases.

    Like this:

    DMA

    while( (Cla1Regs.MIRUN.bit.INT1 == 1) && (ClaComputationDone == false) ){;}
    memcpy(IOBuffer, InBuf_Volt1, sizeof IOBuffer*sizeof(float32));
        ClaComputationDone = false;
        CLASignalType = 0;
        Cla1ForceTask1();


    CLA FFT:
    __interrupt void Cla1Task1 ( void )
    {
       //__mdebugstop();
    
       CLA_CFFT_run512Pt();
       CLA_CFFT_unpack512Pt();
    }


    CLA ISR:
    __interrupt void cla1Isr1 ()
    {
    
        if(CLASignalType == 0)
        {
            //Volt struct
            FFT_Computation_Cla(&volt1, rfft.MagBuf); //Calculating mag and phs
        }
        else
        {
            if(CLASignalType == 1)
            {
                //Curr struct
                FFT_Computation_Cla(&curr1, rfft.MagBuf); //Calculating mag and phs
            }
        }
    
        ClaComputationDone = true;
        PieCtrlRegs.PIEACK.all = M_INT11;
    }


    Another error... my accessViolationISR detect CLA violation fault. But in fetchAddressCLAISR i have 0 value (so address 0x0?)

  • Hi Marek,

    In DMA code snippet line 1: the while condition is true when CLA task 1 is running AND the flag false. I wonder if there could be a race condition where the CLA task ends and the while condition becomes false and continues on before the cla1Isr1 is triggered. Maybe a suggestion is to try changing the condition to only check for the ClaComputationDone flag and remove the MIRUN.INT1 check?

    Regarding the access violation - is it indicating a CLA read or write or fetch fault?

    Thanks,

    Ashwini

  • Hello, yes i can edit this condition. But this should not cause problem. I tried solution program (CLAHandsOn Solution) with my sine signal, exactly same results as in my main program. I have same signal in matlab fft (for compare) and some samples/harmonics are good and some bad in CLA (eg. in matlab fft[1] = 51444.12 and in CLA fft[1] = 0.0) .. 

    It is probably fetch fault.

    My friend told me that, TwiddlesFactors must be in LSram (check RFFT_f32.asm file). But the result does not change if it is in GS or LS.

  • Look, this is my sine LSB table:

    float32 sineLSB[]=
    {
     32768,32969,33170,33371,33572,33774,33975,34176,
     34377,34578,34779,34980,35180,35381,35582,35782,
     35982,36183,36383,36583,36782,36982,37182,37381,
     37580,37779,37978,38177,38375,38573,38771,38969,
     39166,39364,39561,39757,39954,40150,40346,40542,
     40737,40932,41127,41321,41515,41709,41903,42096,
     42288,42481,42673,42864,43056,43247,43437,43627,
     43817,44006,44195,44383,44571,44759,44946,45133,
     45319,45504,45690,45874,46058,46242,46425,46608,
     46790,46972,47153,47333,47513,47693,47872,48050,
     48228,48405,48582,48758,48933,49108,49282,49455,
     49628,49800,49972,50143,50313,50483,50652,50820,
     50988,51155,51321,51487,51651,51815,51979,52142,
     52303,52465,52625,52785,52944,53102,53259,53416,
     53572,53727,53881,54035,54188,54339,54491,54641,
     54790,54939,55087,55234,55380,55525,55669,55813,
     55955,56097,56238,56378,56517,56655,56793,56929,
     57065,57199,57333,57466,57597,57728,57858,57987,
     58115,58242,58368,58493,58618,58741,58863,58984,
     59104,59224,59342,59459,59575,59691,59805,59918,
     60030,60141,60251,60360,60468,60575,60681,60786,
     60890,60993,61095,61195,61295,61393,61491,61587,
     61682,61776,61869,61961,62052,62142,62230,62318,
     62404,62490,62574,62657,62739,62820,62899,62978,
     63055,63131,63206,63280,63353,63425,63495,63565,
     63633,63700,63766,63830,63894,63956,64017,64077,
     64136,64193,64250,64305,64359,64412,64464,64514,
     64563,64611,64658,64704,64748,64791,64834,64874,
     64914,64952,64990,65025,65060,65094,65126,65157,
     65187,65216,65243,65269,65294,65318,65340,65362,
     65382,65401,65418,65435,65450,65464,65476,65488,
     65498,65507,65515,65521,65526,65530,65533,65535,
     65535,65534,65532,65528,65524,65518,65511,65503,
     65493,65482,65470,65457,65442,65427,65410,65391,
     65372,65351,65329,65306,65282,65256,65230,65201,
     65172,65142,65110,65077,65043,65008,64971,64933,
     64894,64854,64813,64770,64726,64681,64635,64587,
     64539,64489,64438,64386,64332,64278,64222,64165,
     64107,64047,63987,63925,63862,63798,63733,63666,
     63599,63530,63460,63389,63317,63244,63169,63093,
     63017,62939,62860,62779,62698,62616,62532,62447,
     62361,62274,62186,62097,62007,61915,61823,61729,
     61635,61539,61442,61344,61245,61145,61044,60942,
     60838,60734,60629,60522,60415,60306,60196,60086,
     59974,59862,59748,59633,59517,59401,59283,59164,
     59044,58924,58802,58679,58556,58431,58305,58179,
     58051,57923,57793,57663,57532,57399,57266,57132,
     56997,56861,56724,56586,56448,56308,56168,56026,
     55884,55741,55597,55452,55307,55160,55013,54865,
     54716,54566,54415,54264,54111,53958,53804,53650,
     53494,53338,53181,53023,52864,52705,52545,52384,
     52223,52060,51897,51734,51569,51404,51238,51071,
     50904,50736,50568,50398,50228,50058,49886,49714,
     49542,49369,49195,49020,48845,48670,48493,48317,
     48139,47961,47782,47603,47424,47243,47062,46881,
     46699,46517,46334,46150,45966,45782,45597,45412,
     45226,45039,44852,44665,44477,44289,44101,43912,
     43722,43532,43342,43151,42960,42769,42577,42385,
     42192,41999,41806,41612,41418,41224,41029,40835,
     40639,40444,40248,40052,39856,39659,39462,39265,
     39068,38870,38672,38474,38276,38077,37879,37680,
     37481,37281,37082,36882,36683,36483,36283,36083,
     35882,35682,35481,35281,35080,34879,34678,34477,
     34276,34075,33874,33673,33472,33271,33069,32868,
     32667,32466,32264,32063,31862,31661,31460,31259,
     31058,30857,30656,30455,30254,30054,29853,29653,
     29452,29252,29052,28852,28653,28453,28254,28054,
     27855,27656,27458,27259,27061,26863,26665,26467,
     26270,26073,25876,25679,25483,25287,25091,24896,
     24700,24506,24311,24117,23923,23729,23536,23343,
     23150,22958,22766,22575,22384,22193,22003,21813,
     21623,21434,21246,21058,20870,20683,20496,20309,
     20123,19938,19753,19569,19385,19201,19018,18836,
     18654,18473,18292,18111,17932,17753,17574,17396,
     17218,17042,16865,16690,16515,16340,16166,15993,
     15821,15649,15477,15307,15137,14967,14799,14631,
     14464,14297,14131,13966,13801,13638,13475,13312,
     13151,12990,12830,12671,12512,12354,12197,12041,
     11885,11731,11577,11424,11271,11120,10969,10819,
     10670,10522,10375,10228,10083,9938,9794,9651,
     9509,9367,9227,9087,8949,8811,8674,8538,
     8403,8269,8136,8003,7872,7742,7612,7484,
     7356,7230,7104,6979,6856,6733,6611,6491,
     6371,6252,6134,6018,5902,5787,5673,5561,
     5449,5339,5229,5120,5013,4906,4801,4697,
     4593,4491,4390,4290,4191,4093,3996,3900,
     3806,3712,3620,3528,3438,3349,3261,3174,
     3088,3003,2919,2837,2756,2675,2596,2518,
     2442,2366,2291,2218,2146,2075,2005,1936,
     1869,1802,1737,1673,1610,1548,1488,1428,
     1370,1313,1257,1203,1149,1097,1046,996,
     948,900,854,809,765,722,681,641,
     602,564,527,492,458,425,393,363,
     334,305,279,253,229,206,184,163,
     144,125,108,93,78,65,53,42,
     32,24,17,11,7,3,1,0,
     0,2,5,9,14,20,28,37,
     47,59,71,85,100,117,134,153,
     173,195,217,241,266,292,319,348,
     378,409,441,475,510,545,583,621,
     661,701,744,787,831,877,924,972,
     1021,1071,1123,1176,1230,1285,1342,1399,
     1458,1518,1579,1641,1705,1769,1835,1902,
     1970,2040,2110,2182,2255,2329,2404,2480,
     2557,2636,2715,2796,2878,2961,3045,3131,
     3217,3305,3393,3483,3574,3666,3759,3853,
     3948,4044,4142,4240,4340,4440,4542,4645,
     4749,4854,4960,5067,5175,5284,5394,5505,
     5617,5730,5844,5960,6076,6193,6311,6431,
     6551,6672,6794,6917,7042,7167,7293,7420,
     7548,7677,7807,7938,8069,8202,8336,8470,
     8606,8742,8880,9018,9157,9297,9438,9580,
     9722,9866,10010,10155,10301,10448,10596,10745,
     10894,11044,11196,11347,11500,11654,11808,11963,
     12119,12276,12433,12591,12750,12910,13070,13232,
     13393,13556,13720,13884,14048,14214,14380,14547,
     14715,14883,15052,15222,15392,15563,15735,15907,
     16080,16253,16427,16602,16777,16953,17130,17307,
     17485,17663,17842,18022,18202,18382,18563,18745,
     18927,19110,19293,19477,19661,19845,20031,20216,
     20402,20589,20776,20964,21152,21340,21529,21718,
     21908,22098,22288,22479,22671,22862,23054,23247,
     23439,23632,23826,24020,24214,24408,24603,24798,
     24993,25189,25385,25581,25778,25974,26171,26369,
     26566,26764,26962,27160,27358,27557,27756,27955,
     28154,28353,28553,28753,28952,29152,29352,29553,
     29753,29953,30154,30355,30555,30756,30957,31158,
     31359,31560,31761,31963,32164,32365,32566,32768
    };


    Matlab FFT (first 10 harm):
    fft[0] = 33553921.000000; fft[1] = 51446.738829; 
    fft[2] = -133.278611;  fft[3] = -112.275631; 
    fft[4] = -106.212885;  fft[5] = -103.780600; 
    fft[6] = -102.510888;  fft[7] = -101.740960; 
    fft[8] = -101.371433; fft[9] = -100.479893;

    -------------------------------------------------------------------------
    CLA FFT (IOBuffer2 after task1 complete):
    fft[0] = 33553920.0; fft[1] = 0.0; 
    fft[2] = 51445.25;  fft[3] = -16768642.0; 
    fft[4] = -133.279663;  fft[5] = 21883.7246; 
    fft[6] = -111.808167;  fft[7] = 12307.0195; 
    fft[8] = -106.212616; fft[9] = 8736.04883;

    So tell me, why are there great values in every second harmonic? When they werent here, it would be good fft.
    I have 2048 FFT in cpu1 (for test purposes, matlab vs cpu1 is correct)

    Libs used in cla project:
    rts2800_fpu32.lib
    c28x_cla_dsp_library_datarom.lib
    c28x_fpu_dsp_library.lib
    IQmath_fpu32.lib
    F2837xRevB_c1bootROM_CLADataROMSymbols_fpu32.lib

  • Hi Marek,

    The CLA FFT API library was created for the workshop example and is not part of our official software support. As such I do not have visibility into the implementation and don't know if this is expected. I am reaching out to other team members and will let you know what I find.

    In the meantime, maybe check the shared datatypes and see that they are the same size on C28 and CLA. You can find information at the following link:

    https://software-dl.ti.com/C2000/docs/cla_software_dev_guide/faq.html#how-are-data-types-different-on-c28x-and-cla

    Thanks,

    Ashwini

  • Thank you very much for looking at it, help me a lot. I will wait for a response :)

  • Hi Marek,

    Here is an input I got from the team, can you check if this is indeed the case for you:

    C28 FPU32 RFFT output: real takes up the first half of the buffer and imaginary takes up the 2nd half (in reverse order):

     

    OutBuf[0] = real[0]

    OutBuf[1] = real[1]

    OutBuf[2] = real[2]

    ………

    OutBuf[N/2] = real[N/2]

    OutBuf[N/2+1] = imag[N/2-1]

    ………

    OutBuf[N-3] = imag[3]

    OutBuf[N-2] = imag[2]

    OutBuf[N-1] = imag[1]

     

    CLA RFFT based on CFFT+unpack: real + imaginary pairs:

     

    OutBuf[0] = real[0]

    OutBuf[1] = imag[0]

    OutBuf[2] = real[1]

    OutBuf[3] = imag[1]

    ………

    OutBuf[N-2] = real[N-1]

    OutBuf[N-1] = imag[N-1]

    Thanks,

    Ashwini

  • Hello, Ive figured it out now..

    Should I also run whole IOBuffer2 for phases ? Or should I first take the real part (to local buffer) and start running the phase function ? ... Or will I only select the angles for the real parts after the calculations ? 

    You brought the light to my problem, thank you very much, heh :)

  • Hi Marek,

    Good to know. I am asking the experts about your question and will let you know once I hear back.

    Thanks,

    Ashwini

  • Hi Marek,

    Which API are you using for the magnitude calculation? If it is RFFT_f32_mag_TMU0, the same as the workshop application, then the IOBuffer2 real and imaginary values will have to be in same layout as output by RFFT on C28. So yes, the buffer from CLA CFFT will need to be post processed and ordered in the same way was well.

    Thanks,

    Ashwini

  • Hey, yes im using this API. So real part will be at first 512 array positions or no ?

    EDIT: Ok i get it. thx

  • Hi Marek,

    Awesome, I will ahead and close this thread.

    Thanks,
    Ashwini