This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28377D: Optimal RAM usage for CLA operations

Part Number: TMS320F28377D
Other Parts Discussed in Thread: C2000WARE

In my application CLA1 is running an IIR filter. The input x_in is passed from Cpu1 via the Cpu1ToCla1 message RAM, and the output y_out is passed back via the Cla1ToCpu1 message RAM. Here is a simplified example of the IIR code, which is based off the cla_iir2p2z project in C2000ware (extended to 3p3z):

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// global variables, declared elsewhere
// float x_in; //input to IIR. In Cpu1ToCla1 message RAM
// float xn; //copy of x_in. Accessed frequently by IIR code
// float D[6]; //shift register/accumulator for IIR
// float A[4]; //denominator coefficients of IIR
// float B[4]; //numerator coefficients of IIR
// float yn; //output of IIR. Accessed frequently by IIR code
// float y_out; //copy of yn. In Cla1ToCpu1 message RAM
static inline void run3p3z_CLA(void)
{ //transposed direct form II
// Network Diagram :
//
// xn------>(x)--->(+)--------------->yn
// | ^ ^ |
// | | |D[5] |
// | B(0) (z) |
// | ^ |
// | |D[4] |
// |-->(x)--->(+)<-----(x)---|
// | ^ ^ ^ |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

My question is how I should place the different variables in RAM in order to maintain fastest execution and minimizing access to the message RAMs.

Currently I use local versions of x and y (xn and xy) which are in local shared RAM (LS0, for example). That way I only access each message RAM once per iteration.

I also see that my code produces assembly with paralleled instructions (MMOV32 || MADDF32), with each accessing a different RAM address (one is usually from D[], the other from A[] or B[]). If those two addresses are in the same RAM block (LS0), will that result in slower execution due to wait states? Should I therefore put D[] in a separate LSx block from A[] and B[]? The original example project did not explicitly place the shift registers, they're just declared in the .cla source file without a DATA_SECTION #pragma.

Regards,

Mike

  • Hi Mike,

    If code makes extensive use of two data buffers, putting each buffer in a different RAM block may improve performance. The goal is to reduce the pipeline stalls due to write and read occurring in the same cycle to different buffers.

    Thanks,

    Ashwini