TMS320F28375D: Problems with running CLA code

Konstantin Beloff

Part Number: TMS320F28375D

Hi!

Thanks to your previous help, I have fixed my linker file and all memory arrangement, and simple programs run flawlessly in my CLA.

But then I get to running an FFT code I've written, and thr problems begin. The code runs fine on the main core (tested the correctness via MATLAB).

When I transferred the code to CLA first time, it just didn't execute at all, showing no errors during building or debugging. Then I split the FFT into several smaller sub-functions, and the code miraculously started executing, but the output was just ve-e-ery slightry relevant to what should be, an in terms of dsp such small accuracy is unacceptable.

Here are my FFT tries: the united one and a split one. The CLA code is triggered from the main function by Cla1ForceTask1andWait();

#include "fft_test1_cla.h"
#include "F28x_Project.h"


//FFT variables:
int16_t n, nspan, submatrix, node;   
int16_t N1;// = 8;//1<<logN;       
double temp, angle, realtwiddle, imtwiddle;
int16_t span;

int16_t i;
//Rev bits variables:
int16_t forward, rev, toggle;
int16_t nodd, noddrev;               // to hold bitwise negated or odd values
int16_t  halfn, quartn, nmin1;




void innerloop();
void secondloop();
void outerloop();
void addandsub();
void rotations();

void reverse (double *real, double *output);
__interrupt void Cla1Task1 ( void )
{
   N1=8;
   span=8;
   n=0;
outerloop();
}


void innerloop()
{
    addandsub();
    rotations();
}

void addandsub()
{
    nspan = n+span;
    temp = real[n] + real[nspan];       // additions & subtractions
    real[nspan] = real[n]-real[nspan];
    real[n] = temp;
    temp = im[n] + im[nspan];
    im[nspan] = im[n] - im[nspan];
    im[n] = temp;
}

void rotations()
{
    angle = primitive_root * node;      // rotations
    realtwiddle = CLAcos(angle);
    imtwiddle = -CLAsin(angle);
    temp = realtwiddle * real[nspan] - imtwiddle * im[nspan];
    im[nspan] = realtwiddle * im[nspan] + imtwiddle * real[nspan];
    real[nspan] = temp;
    n++;
}

void secondloop()
{
    for(submatrix=0; submatrix<(N1>>1)/span; submatrix++)
    {
        for(node=0; node<span; node++){
         innerloop();
        }
    }
}

void outerloop()
{
    for(span=N1>>1; span; span>>=1)      // loop over the FFT stages
            {
               primitive_root = -(3.14159265359/span);
               secondloop();
            }
}

__interrupt void Cla1Task1 ( void )
{
   N1=8;
   span=8;
   n=0;
 for(span=N1>>1; span; span>>=1)      // loop over the FFT stages
        {
           primitive_root = -(3.14159265359/span);   // define MINPI in the header
           //secondloop();
           //__mdebugstop();
//#pragma MUST_ITERATE(loop2)
           for(submatrix=0; submatrix<(N1>>1)/span; submatrix++)
           {
              // __mdebugstop();

//#pragma MUST_ITERATE(loop3)
              for(node=0; node<span; node++)
              {
                  __mdebugstop();
                nspan = n+span;
                temp = real[n] + real[nspan];       // additions & subtractions
                real[nspan] = real[n]-real[nspan];
                real[n] = temp;
                temp = im[n] + im[nspan];
                im[nspan] = im[n] - im[nspan];
                im[n] = temp;

                angle = primitive_root * node;      // rotations
                realtwiddle = CLAcos(angle);
                imtwiddle = -CLAsin(angle);
                temp = realtwiddle * real[nspan] - imtwiddle * im[nspan];
                im[nspan] = realtwiddle * im[nspan] + imtwiddle * real[nspan];
                real[nspan] = temp;

                n++;   // not forget to increment n

              } // end of loop over nodes
              __mdebugstop();
              n = (n+span) & (N1-1);   // jump over the odd blocks

            } // end of loop over submatrices

         } // end of loop over FFT stages
    __mdebugstop();
//reverse(real,outputR);
//reverse(im, outputI);
}

over 8 years ago

0 Sal Pezzino over 8 years ago

TI__Mastermind 32995 points

Hi Konstantin,

I did not have a chance to look at your code today. I will take a look tomorrow and get back to you.

It is possible the answers would be different between C28x and CLA. Were you using the FPU on the C28x core?

Regards,
sal

0 Konstantin Beloff over 8 years ago in reply to Sal Pezzino

Prodigy 230 points

Hi Sal,

I tried using FPU and optimization, both at the same time or one at a time, the use of FPU didn't result in any notable difference.

Regards,

Konstantin

0 Sal Pezzino over 8 years ago in reply to Konstantin Beloff

TI__Mastermind 32995 points

I am having some trouble understanding the issues. Can you please clarify?

Is the accuracy sufficient?

Have you looked at the CLA examples to see how to properly configure and start a CLA task?

sal

0 Konstantin Beloff over 8 years ago in reply to Sal Pezzino

Prodigy 230 points

Sure thing.

I test my alrorythm in a simple case: calculate an 8-point real FFT of sin(x). Matlab and C28x provide the same result in imaginary and real parts of output (shown in the attached screenshot)

Here I get only two significant characters: 4i and -4i, others are approximated by zero.

When I move the code to CLA, I get Im.output {0,0,0,0,0,0,0,2} and real output {-8e-08,-5e-08,0,-5e-0.8,0,0,0,0}. So you see, something is wrong with the way CLA computes. The real part of CLA output has much bigger order, but still e-08 can be considered as a zero, but the imaginary part goes wild.

And as I've said, the code works on the main core, so it doesn't seem to be the source of the problem.

Regards,

konstantin

0 Sal Pezzino over 8 years ago in reply to Konstantin Beloff

TI__Mastermind 32995 points

How are real[] and im[] arrays defined?

Can you step through the code and see where it begins to differ?

sal

0 Konstantin Beloff over 8 years ago in reply to Sal Pezzino

Prodigy 230 points

extern double im[];
extern double real[];

they are defined in the main .c file and are situated in shared RAM.
Will be able to access the hardware only on Monday.

While stepping through the CLA disassembly, the strange stuff began appearing right from the begining, after the input variables are defined (I mean from this line:
for(span=N1>>1; span; span>>=1)

0 Sal Pezzino over 8 years ago in reply to Konstantin Beloff

TI__Mastermind 32995 points

This may be a possible synchronization issue. Make sure that the variables are initialized correctly by the either the C28x or the CLA by the time the task begins to execute. Also, make sure the C28x is not modifying those variables while the CLA is operating on them.

For this you can use shared memory or the interrupt lines from the CLA to the C28x to signal when it is safe to modify the variables.

sal

0 Konstantin Beloff over 8 years ago in reply to Sal Pezzino

Prodigy 230 points

Synchronization seems not to be the issue.. All data is in shared RAM, and after it is loaded there from C28x, it is used only by CLA. Flags are changing in the right order, so the code parts have their time to execute.

This seems somehow strange, but can it be that CLA has problems with compiling/executing too many code lines in a single function? Because when all my FFT was placed in one function, there was no output at all, and after splitting it into 3 "loops" something started to appear on the output. What can it be, considering this fact?

I am just out of ideas what to do. Maybe it's some hardware limitations of CLA, like execution time limit, which cuts off all "extra" calculations (these are just fantasies, have no idea about stuff like this), or lack of program memory (shouldn't be, it is already 0x1000), or something else

0 Sal Pezzino over 8 years ago in reply to Konstantin Beloff

TI__Mastermind 32995 points

There should not be any problem with the function/task being too long. There is nothing limiting the compiler.

Please make sure all the program and data are in the LS RAM blocks available on the part you are using.

In order to find your problem and debug it, you can break up the tasks at different locations to help identify the issue, since you said breaking it up solves the problem. Also, I recommend setting an MDEBUGSTOP in your CLA task to step through the code and see where the problem is.

You can also try combining the code into one function again, and placing some MNOPs within your code to see if that solves the problem instead of breaking up the functions. Can you try this and tell me if you are able to solve the issue with MNOPs instead of breaking up the function please? I am wondering if there is a race condition somewhere and the compiler is not properly issuing instructions on account of the unprotected pipeline of the CLA. There may be some read/write races occurring.

Let me know what you find.

Regards,
sal

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F28375D: Problems with running CLA code