This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F280049C: How to benchmark C28x + FPU with ARM Cortex M4, especially in math capability.

Part Number: TMS320F280049C

Hello,

Referring to the thread below, I'd like to find out the way to benchmark c28x+FPU with ARM cortex M4?

is there any example to compare the time taken by TMS320F280049C(100Mhz, 100MIPS) with a certain model of ARM cortex M4 (120Mhz, 200MIPS)  on same mathematical problem ?

Best Regards,

Mike

  • Mike,

    I'm not sure I understand your question exactly, but I'll write my thoughts based on what I think you're asking.

    To benchmark code on C28x+FPU, you can do one of 4 things:

    1. To get external visibility on an oscilloscope, for example, like many users do, you could set a GPIO before the benchmarking code snippet, and clear the GPIO after the benchmarking code snippet. Then you could connect the GPIO signals to a scope. The disadvantage of doing this is accuracy - the GPIO set and clear times affect the measurement, so if the code you are benchmarking is very short, the accuracy could be significantly impacted.

    2. Use Software breakpoints in CCS - place a breakpoint at the start of the code section you are benchmarking, and another breakpoint just after the end of the code section. Then use Run->Clock->Enable in CCS, and this will display a clock at the bottom right corner of the CCS window. Double clicking this number will clear it to 0. The disadvantage of this technique is that you need the emulator connected. And if you are running code, with compiler optimizations enabled, you may not be able to place breakpoints exactly where you need them. (You could view the disassembly and place breakpoints there for better control).

    3. Use the C28x timers - start 

    cnt_start = C28_profile_read(); // before start of benchmarking section
    cnt_end = C28_profile_read(); // after end of benchmarking section
    tm_new = cnt_start - cnt_end; // this is the number of cycles (can be scaled to time using the clock speed the device is running at

    // PROFILE_READ
    // C28_profile_read() - return the CPU Timer counter value
    //
    static inline uint32_t C28_profile_read(void)
    {
    return(HWREG(CPUTIMER1_BASE + CPUTIMER_O_TIM));
    }

    There are associated initialization functions - if you're interested, I can point you to those/ share them with you.

    4. Using ERAD - moving forward with newer devices, this should be the benchmarking technique of choice. I don't have hands-on experience with this yet, but I can point in the right direction, if interested.

    "The Embedded Real-Time Analysis and Diagnostic (ERAD) module enhances the debug and system
    analysis capabilities of the device by providing additional hardware breakpoints and counters for profiling."

    Thanks,

    Sira

  • HI Sira,

    Yes, the above answers my questions literally.

    However out of convenience , is there any example which can prove that c28x is better and faster  in solving math problem when being compared to arm cortex m4? Because in the thread I pasted above, there was a claim saying that c28x is faster compared to cortex m4.

    Best regards,

    Mike

  • Hi Sira,

    I've found a thread which has some comparison data between CM4F and C28x.

    This should be the one that i'm looking for. 

    From the table provided by Alex T. on Nov 30, 2010 12:15 AM, I understand that lesser clock cycle taken is better but what does "bytes" mean in his table?

    https://e2e.ti.com/support/microcontrollers/c2000/f/171/p/21092/277090

    I have translated clock cycles to time taken. Maybe, you can help me to take a look, to see is there any problem with my translation below

    FIR (32 block, 32 taps) Cortex-M4F C28x
    (Cycles) Time taken in ms (120Mhz) (Cycles) Time taken in ms (100Mhz)
    16-bit fixed pt FIR 2100 17.50 1109 11.09
    32-bit fixed pt FIR 2730 22.75 1428 14.28
    32-bit floating pt FIR 4750 39.58 1565 15.65

    Best Regards,

    Mike

  • Mike,

    Thanks for sharing this.

    In your table, the time taken would be in us, not ms.

    Bytes refers to code size, I believe.

    Thanks,

    Sira