This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
In my application CLA1 is running an IIR filter. The input x_in is passed from Cpu1 via the Cpu1ToCla1 message RAM, and the output y_out is passed back via the Cla1ToCpu1 message RAM. Here is a simplified example of the IIR code, which is based off the cla_iir2p2z project in C2000ware (extended to 3p3z):
// global variables, declared elsewhere // float x_in; //input to IIR. In Cpu1ToCla1 message RAM // float xn; //copy of x_in. Accessed frequently by IIR code // float D[6]; //shift register/accumulator for IIR // float A[4]; //denominator coefficients of IIR // float B[4]; //numerator coefficients of IIR // float yn; //output of IIR. Accessed frequently by IIR code // float y_out; //copy of yn. In Cla1ToCpu1 message RAM static inline void run3p3z_CLA(void) { //transposed direct form II // Network Diagram : // // xn------>(x)--->(+)--------------->yn // | ^ ^ | // | | |D[5] | // | B(0) (z) | // | ^ | // | |D[4] | // |-->(x)--->(+)<-----(x)---| // | ^ ^ ^ | // | | |D[3] | | // | B(1) (z) A(1) | // | ^ | // | |D[2] | // --->(x)--->(+)<-----(x)---- // | ^ ^ ^ | // | | |D[1] | | // | B(2) (z) A(2) | // | ^ | // | |D[0] | // --->(x)--->(+)<-----(x)---- // ^ ^ // | | // B(3) A(3) // xn=x_in; //copy from shared message ram to local ram to reduce conflicts between cla and cpu!!!! yn = xn*B[0] + D[5]; D[4] = xn*B[1] + yn*A[1] + D[3]; D[5] = D[4]; D[2] = xn*B[2] + yn*A[2] + D[1]; D[3] = D[2]; D[0] = xn*B[3] + yn*A[3]; D[1] = D[0]; }
My question is how I should place the different variables in RAM in order to maintain fastest execution and minimizing access to the message RAMs.
Currently I use local versions of x and y (xn and xy) which are in local shared RAM (LS0, for example). That way I only access each message RAM once per iteration.
I also see that my code produces assembly with paralleled instructions (MMOV32 || MADDF32), with each accessing a different RAM address (one is usually from D[], the other from A[] or B[]). If those two addresses are in the same RAM block (LS0), will that result in slower execution due to wait states? Should I therefore put D[] in a separate LSx block from A[] and B[]? The original example project did not explicitly place the shift registers, they're just declared in the .cla source file without a DATA_SECTION #pragma.
Regards,
Mike
Hi Mike,
If code makes extensive use of two data buffers, putting each buffer in a different RAM block may improve performance. The goal is to reduce the pipeline stalls due to write and read occurring in the same cycle to different buffers.
Thanks,
Ashwini