TMS320F280049C: How to benchmark C28x + FPU with ARM Cortex M4, especially in math capability.

mike see1

Part Number: TMS320F280049C

Hello,

Referring to the thread below, I'd like to find out the way to benchmark c28x+FPU with ARM cortex M4?

is there any example to compare the time taken by TMS320F280049C(100Mhz, 100MIPS) with a certain model of ARM cortex M4 (120Mhz, 200MIPS) on same mathematical problem ?

Best Regards,

Mike

over 4 years ago

0 Sira Rao80 over 4 years ago

TI__Mastermind 23200 points

Mike,

I'm not sure I understand your question exactly, but I'll write my thoughts based on what I think you're asking.

To benchmark code on C28x+FPU, you can do one of 4 things:

1. To get external visibility on an oscilloscope, for example, like many users do, you could set a GPIO before the benchmarking code snippet, and clear the GPIO after the benchmarking code snippet. Then you could connect the GPIO signals to a scope. The disadvantage of doing this is accuracy - the GPIO set and clear times affect the measurement, so if the code you are benchmarking is very short, the accuracy could be significantly impacted.

2. Use Software breakpoints in CCS - place a breakpoint at the start of the code section you are benchmarking, and another breakpoint just after the end of the code section. Then use Run->Clock->Enable in CCS, and this will display a clock at the bottom right corner of the CCS window. Double clicking this number will clear it to 0. The disadvantage of this technique is that you need the emulator connected. And if you are running code, with compiler optimizations enabled, you may not be able to place breakpoints exactly where you need them. (You could view the disassembly and place breakpoints there for better control).

3. Use the C28x timers - start

cnt_start = C28_profile_read(); // before start of benchmarking section
cnt_end = C28_profile_read(); // after end of benchmarking section
tm_new = cnt_start - cnt_end; // this is the number of cycles (can be scaled to time using the clock speed the device is running at

// PROFILE_READ
// C28_profile_read() - return the CPU Timer counter value
//
static inline uint32_t C28_profile_read(void)
{
return(HWREG(CPUTIMER1_BASE + CPUTIMER_O_TIM));
}

There are associated initialization functions - if you're interested, I can point you to those/ share them with you.

4. Using ERAD - moving forward with newer devices, this should be the benchmarking technique of choice. I don't have hands-on experience with this yet, but I can point in the right direction, if interested.

"The Embedded Real-Time Analysis and Diagnostic (ERAD) module enhances the debug and system
analysis capabilities of the device by providing additional hardware breakpoints and counters for profiling."

Thanks,

Sira

0 mike see1 over 4 years ago in reply to Sira Rao80

Prodigy 140 points

HI Sira,

Yes, the above answers my questions literally.

However out of convenience , is there any example which can prove that c28x is better and faster in solving math problem when being compared to arm cortex m4? Because in the thread I pasted above, there was a claim saying that c28x is faster compared to cortex m4.

Best regards,

Mike

0 mike see1 over 4 years ago in reply to mike see1

Prodigy 140 points

Hi Sira,

I've found a thread which has some comparison data between CM4F and C28x.

This should be the one that i'm looking for.

From the table provided by Alex T. on Nov 30, 2010 12:15 AM, I understand that lesser clock cycle taken is better but what does "bytes" mean in his table?

https://e2e.ti.com/support/microcontrollers/c2000/f/171/p/21092/277090

I have translated clock cycles to time taken. Maybe, you can help me to take a look, to see is there any problem with my translation below

FIR (32 block, 32 taps)	Cortex-M4F		C28x
FIR (32 block, 32 taps)	(Cycles)	Time taken in ms (120Mhz)	(Cycles)	Time taken in ms (100Mhz)
16-bit fixed pt FIR	2100	17.50	1109	11.09
32-bit fixed pt FIR	2730	22.75	1428	14.28
32-bit floating pt FIR	4750	39.58	1565	15.65

Best Regards,

Mike

0 Sira Rao80 over 4 years ago in reply to mike see1

TI__Mastermind 23200 points

Mike,

Thanks for sharing this.

In your table, the time taken would be in us, not ms.

Bytes refers to code size, I believe.

Thanks,

Sira

C2000™︎ microcontrollers

C2000 microcontrollers forum

TMS320F280049C: How to benchmark C28x + FPU with ARM Cortex M4, especially in math capability.