This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

[FAQ] C28x: What is the DMIPS of a C28x device?

Other Parts Discussed in Thread: TMS320F28377D

Question:

What is the DMIPS of a C28x device?

  • Answer:

    We do not measure the performance of our C2000 MCUs by using Dhrystone benchmarks. Dhrystone isn't a very helpful gauge for folks doing CPU architecture comparison, especially for computationally intensive real time control on an embedded MCU. Some concerns are, for example, CPU architects and compilers can be (and have been) designed to score well on Dhrystone but perform poorly in real-life, real-time applications. Dhrystone only measures a few basic operations. It does not measure multiply accumulate, floating-point, SIMD or many other types of operations needed in mathematically-intensive algorithms many of which are supported by C compilers today.

    Also, the benchmark has lost its credibility for modern CPU architectures as it does not have an official certification process. Frankly, if the world thought Dhrystone was still relevant it would have pushed for a certification process long ago. Another pretty serious flaw is that the disclosure of the benchmarking environment is not required and sometimes when it is you will see special switches for the compiler that are only used during Dhrystone benchmarking and wouldn't otherwise be used in production development. In short, it's just too out-of-date and too easy to manipulate to be valuable.

    If you look at the TMS320F28377D, as an example, and dual C28x and dual CLA cores (4 processing concurrently), each of these cores is capable of executing an instruction per cycle. Those instructions are C28x or CLA instructions which happen to be much more powerful for real time processing than what the Dhrystone benchmark creators in the 1980s could have anticipated at the time. More importantly, the C2000 C28x and CLA are especially well-built for real time control of power electronics circuits. The only relevant benchmarking in those cases is performance in the task of controlling power electronics. If you are building an application like that, then some of our benchmark data is going to be very relevant for you. For example, the '377 can complete a Park Transform in 19 cycles because of the new Trigonometry instructions added to the core. This is leveraged in an optimized sensored FOC motor control algorithm that is capable of closing the current loop in 1.5 microseconds - and at high loop control bandwidth as well. With a '377D you can control two of these fast current loops - on each core and have CPU bandwidth to spare.

    One more point: while flash cell read accesses are slower than RAM accesses, because of how a C2000 is designed, you will see access efficiency on fetches from flash that are very close to RAM access speeds. This is because of the creative design of our interface wrapper. As with any pipelined CPU, code discontinuities (branches/exceptions) are going to impact this performance. So, make sure you understand your C28x native instructions/features, the code the compiler creates and where it locates the data being used if you want to get the most performance out of your system.

    Best Regards,

    Brian Fortman

    Industrial Drives and Automation

    C2000 Microcontrollers

  • Thanks Lori and Brian; Very Interesting!