Other Parts Discussed in Thread: C2000WARE
In my application CLA1 is running an IIR filter. The input x_in is passed from Cpu1 via the Cpu1ToCla1 message RAM, and the output y_out is passed back via the Cla1ToCpu1 message RAM. Here is a simplified example of the IIR code, which is based off the cla_iir2p2z project in C2000ware (extended to 3p3z):
// global variables, declared elsewhere
// float x_in; //input to IIR. In Cpu1ToCla1 message RAM
// float xn; //copy of x_in. Accessed frequently by IIR code
// float D[6]; //shift register/accumulator for IIR
// float A[4]; //denominator coefficients of IIR
// float B[4]; //numerator coefficients of IIR
// float yn; //output of IIR. Accessed frequently by IIR code
// float y_out; //copy of yn. In Cla1ToCpu1 message RAM
static inline void run3p3z_CLA(void)
{ //transposed direct form II
// Network Diagram :
//
// xn------>(x)--->(+)--------------->yn
// | ^ ^ |
// | | |D[5] |
// | B(0) (z) |
// | ^ |
// | |D[4] |
// |-->(x)--->(+)<-----(x)---|
// | ^ ^ ^ |
// | | |D[3] | |
// | B(1) (z) A(1) |
// | ^ |
// | |D[2] |
// --->(x)--->(+)<-----(x)----
// | ^ ^ ^ |
// | | |D[1] | |
// | B(2) (z) A(2) |
// | ^ |
// | |D[0] |
// --->(x)--->(+)<-----(x)----
// ^ ^
// | |
// B(3) A(3)
//
xn=x_in; //copy from shared message ram to local ram to reduce conflicts between cla and cpu!!!!
yn = xn*B[0] + D[5];
D[4] = xn*B[1] + yn*A[1] + D[3];
D[5] = D[4];
D[2] = xn*B[2] + yn*A[2] + D[1];
D[3] = D[2];
D[0] = xn*B[3] + yn*A[3];
D[1] = D[0];
}
My question is how I should place the different variables in RAM in order to maintain fastest execution and minimizing access to the message RAMs.
Currently I use local versions of x and y (xn and xy) which are in local shared RAM (LS0, for example). That way I only access each message RAM once per iteration.
I also see that my code produces assembly with paralleled instructions (MMOV32 || MADDF32), with each accessing a different RAM address (one is usually from D[], the other from A[] or B[]). If those two addresses are in the same RAM block (LS0), will that result in slower execution due to wait states? Should I therefore put D[] in a separate LSx block from A[] and B[]? The original example project did not explicitly place the shift registers, they're just declared in the .cla source file without a DATA_SECTION #pragma.
Regards,
Mike