This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28075: Writing more efficient c code on CLA

Part Number: TMS320F28075


Hi Champ,

I am asking for my customer.

My customer is familiar with c code and so far they have implemented over 90% utilization rate for control loop on CLA, while there are much more things in control loop.

They have tried to raise the compiler optimization level while the improvement is limit.

So, they want to organize their c code running on CLA doing from fundamental, coding in a more efficient and proper way. 

Do we have a guideline to let user to follow the c code coding style running on CLA ? That should be really helpful for user to avoid some unrecommended coding way on CLA.

Thanks and regards,

Johnny 

  • Hi Johnny,

    We have a SW developers guide available, however it seems the customer is quite familiar with the CLA already.

    I will put together some ideas and I will post them tomorrow (Friday).

    Regards,

    Lori

  • Some things to keep in mind when developing for CLA:

    • CLA is smaller than the C28x FPU - There are fewer floating-point result registers and fewer aux registers. 
    • CLA has a smaller instruction set than the C28x FPU.
      • The CLA's instruction set is focused on floating-point math.
      • CLA has integer add/sub but not as many variations as the C28x FPU. It does not have an integer multiply.
      • Some operations like modulus (x = y%1000) are supported, but there is no special instruction to help this calculation like C28x has.
    • CLA does not have a RPT block. This means for loops there will be a branch overhead each loop that the C28x FPU can overcome.
    • CLA does not have TMU which can have an impact on sin/cos/atan type instructions.

    Ideas:

    • Look for opportunities where the CLA can help the C28x by performing floating-point computations in parallel with the C28x.
    • The CLA does not have the same read-modify-write operations the C28x has.
      • This can cause extra cycles accessing .bit fields (using our bit-field structures).
      • One solution is to use .all to write/read to a whole register instead as described in this E2E post. Most ePWM registers that are updated during runtime can be accessed as .all instead of by .bit.
    • A lot of decisions (if/else/else type code) can reduce performance on both the C28x and the CLA. Look for ways to change the logic to reduce the decisions or to use more than one task instead of having a long list of decisions in one task. 
    • CLA does not have instructions to support unsigned integer compares. Try performing compares with floating point numbers. 
    • Reduce the usage of pointers. Example:
      • Current code: A structure is used by the CLA and the structure is accessed a number of times during a task.
      • Try this: read from the structure once, use the local variable and then write back to the structure at the end. (Note: I've seen this improve performance on the C28x as well)

    Best regards,

    Lori