Cycle scavenging on C2000™ MCUs, part 5: TMU and CLA

In parts 1 through 4 of this series, I’ve described several differentiating features built around the analog-to-digital converter (ADC) modules that enable C2000™ microcontrollers (MCUs) to scavenge cycles at the sensing stage. In this installment, I’ll turn my attention to the processing stage and focus on some of the built-in features that enable C2000 MCUs to minimize processing latency and meet control-loop performance demands.

Real-time control systems require fast and efficient processing, with latency kept to a minimum in order to boost overall system performance. To achieve this dramatic increase in real-time performance, C28x central processing units (CPUs) on C2000 MCUs integrate a number of hardware accelerators that enable very fast execution of trigonometric functions and complex math operations, saving several CPU cycles in the process.

The trigonometric math unit (TMU) is one such hardware accelerator. The TMU assists the main C28x CPU in accelerating the execution of trigonometric functions like sine, cosine, arctangent and 1/X that are otherwise quite cycle-intensive. The cycle-scavenging capabilities of the TMU also extend to many common real-time control algorithms like park and inverse park transforms, space vector generations, fast Fourier transform (FFT) magnitude and phase calculations. A park transform, for example, can easily take anywhere from 80 to more than 100 cycles to execute on an MCU without a TMU. With a TMU, these transforms can be executed in just 13 cycles, yielding a 10x performance improvement over competing devices.

Figure 1: TMU performance improvement for park transform

The C2000 compiler also has built-in support for automatic generation of TMU instructions. This means that you can write code in C using math.h functions; the compiler will automatically use TMU instructions where applicable instead of run-time support library calls. This in turn results in significantly fewer cycles and dramatically increases the performance of trigonometric operations.

The control law accelerator (CLA) plays a critical role in scavenging cycles from the main processors. CLAs are independent floating-point processors that have direct access to control peripherals like the ADC and pulse-width modulation (PWM) modules. This enables the CLA to execute real-time control algorithms in parallel with C28x CPUs, effectively doubling system bandwidth and reducing sample-to-output latency.

Due to its low-latency architecture and ability to directly access control peripherals, the CLA is also able to read the ADC result register on the same cycle at which the ADC sample conversion is complete. This prevents cycles from being wasted and enables just-in-time reading of the ADC, which again reduces the sample-to-output delay and further improves the system response time for higher-frequency control loops.

Figure 2: The CLA can offload intensive signal-processing tasks from the CPU, saving cycles in the process

The CLA does not use interrupts to synchronize with hardware. Instead, it supports up to eight independent tasks, which are each mapped to hardware events such as a timer or data being available on an ADC. A task initiated on the CLA runs to completion without further involvement of the CPU. This is important from a cycle-scavenging perspective because it significantly reduces the burden on the CPU and frees it to perform other system-level tasks, or even manage a second control loop if necessary. Having support for eight separate tasks enables the CLA to support multiple control loops or phases simultaneously. Finally, eliminating interrupts on the CLA eliminates context-switching overhead, which also saves several CPU cycles.

As you can see, the TMU and CLA are very important to the C2000 family of MCUs from a real-time control perspective. These accelerators go a long way in reducing sample-to-output latency and enabling faster system response times. Furthermore, they play a significant role in reducing the burden on the main C28x processors, freeing them to perform other required tasks.

In the sixth installment, I will focus on some of the features built around the PWM modules that enable C2000 MCUs to scavenge cycles at the actuation stage of real-time control systems.

Additional resources