This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

tiva instruction rate very slow?

Other Parts Discussed in Thread: TM4C123GH6PGE

Hi,

I have a Tiva TM4C123GH6PGE clocked by an external 25MHz clock module.

I am initialising the clock with...

ROM_SysCtlClockSet(SYSCTL_SYSDIV_4 | SYSCTL_USE_PLL | SYSCTL_OSC_MAIN | SYSCTL_XTAL_25MHZ);

If I run...

int speed;

    speed = ROM_SysCtlClockGet();

speed is set to 50,000,000 as expected, and hoping for approximately 50MIPS

so...

If I then time how long it takes to do 20 integer increments...

int number;

        pinset;      //for scope

        number++;
        number++;
        number++;  etc etc

        pinclear;

these 20 increments take approx 4uS, which means each increment takes 200nS, this is 5MIPS

there seem to be no references to clock cycles per instruction in the documentation

what have I done wrong?

Thanks, Richard

  • Believe that ARM documentation and Joseph Yiu's book, "The Definitive Guide to ARM Cortex-M3" (indeed - it targeted the now NRND parts, here) provide such detail. 

    Pro/serious IDEs (such as IAR) provide a "live" cycle counter - if you wish I'll repeat your test and report...

  • Hi cb1, thanks for your reply - I'm using Code Composer Studio. Is 200nS per instruction what you would expect? I have just measured again with a "delay loop" macro...

    #define    delay1ms    for(delay = 0; delay < 2500; delay++) ;

    1mS/2500=400nS for a compare and an increment, so this confirms the performance I am getting, which is very disappointing. I would be grateful if you could have a quick test yourself and let me know.

    Thanks, Richard

  • Hi back Richard,

    This from Joseph's book: "Many instructions (cb1: "but not all") are single cycle!   This includes, "multiply."

    Have client lunch now - but shall return w/test results...

  • Hi cb1, many thanks. Seems like I should expect instructions to take 20nS then.

    I have just repeated my test on a Launchpad with this code...

    #include "stdint.h"
    #include "stdbool.h"
    #include "driverlib/sysctl.h"
    #include "driverlib/rom.h"

    #define redLedTog    ROM_GPIOPinWrite(GPIO_PORTF_BASE, GPIO_PIN_1, (ROM_GPIOPinRead(GPIO_PORTF_BASE, GPIO_PIN_1) ^GPIO_PIN_1))
    #define    delay1ms    for(delay = 0; delay < 2500; delay++) ;

    int speed, delay;

    int main(void) {
        ROM_SysCtlClockSet(SYSCTL_SYSDIV_4 | SYSCTL_USE_PLL | SYSCTL_OSC_MAIN | SYSCTL_XTAL_16MHZ);
        ROM_SysCtlPeripheralReset(SYSCTL_PERIPH_GPIOF);
        ROM_SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOF);
        ROM_GPIOPinTypeGPIOOutput(GPIO_PORTF_BASE, GPIO_PIN_1);

        speed = ROM_SysCtlClockGet();

        while(1) {
            redLedTog;
            delay1ms;
        }
    }

    with the same results... speed is set to 50,000,000 but to get 1mS between led toggles I need 2500 inc/compares

    hope someone can shed some light on this!

    it's beer-time over here! try again tomorrow

    cheers

    Richard

  • Richard Bland said:
           delay1ms;

    You haven't stated what the compiler optimization level has been set to, or what the generated assembler instructions were. I haven't currently got a launchpad to try measuring the timings, but using CCS 5.5 and in the project properties -> Build -> ARM Compiler -> Advanced Options -> Assembler Options ticked "Keep the generated assemble language (.asm) file" option.

    With the Optimization Level set to Off the assembler generated for the delay1ms macro was:

    	.dwpsn	file "../project.c",line 45,column 9,is_stmt,isa 1
            LDR       A3, $C$CON8           ; [DPU_3_PIPE] |45| 
            LDR       A1, $C$CON8           ; [DPU_3_PIPE] |45| 
            MOVS      A2, #0                ; [DPU_3_PIPE] |45| 
            STR       A2, [A3, #0]          ; [DPU_3_PIPE] |45| 
            LDR       A1, [A1, #0]          ; [DPU_3_PIPE] |45| 
            MOV       A2, #2500             ; [DPU_3_PIPE] |45| 
            CMP       A2, A1                ; [DPU_3_PIPE] |45| 
            BLE       ||$C$L1||             ; [DPU_3_PIPE] |45| 
            ; BRANCHCC OCCURS {||$C$L1||}    ; [] |45| 
    ;* --------------------------------------------------------------------------*
    ;*   BEGIN LOOP ||$C$L2||
    ;*
    ;*   Loop source line                : 45
    ;*   Loop closing brace source line  : 45
    ;*   Known Minimum Trip Count        : 1
    ;*   Known Maximum Trip Count        : 4294967295
    ;*   Known Max Trip Count Factor     : 1
    ;* --------------------------------------------------------------------------*
    ||$C$L2||:    
            LDR       A2, $C$CON8           ; [DPU_3_PIPE] |45| 
            LDR       A3, $C$CON8           ; [DPU_3_PIPE] |45| 
            LDR       A1, [A2, #0]          ; [DPU_3_PIPE] |45| 
            ADDS      A1, A1, #1            ; [DPU_3_PIPE] |45| 
            STR       A1, [A2, #0]          ; [DPU_3_PIPE] |45| 
            LDR       A1, [A3, #0]          ; [DPU_3_PIPE] |45| 
            MOV       A2, #2500             ; [DPU_3_PIPE] |45| 
            CMP       A2, A1                ; [DPU_3_PIPE] |45| 
            BGT       ||$C$L2||             ; [DPU_3_PIPE] |45| 
            ; BRANCHCC OCCURS {||$C$L2||}    ; [] |45| 
    

     Whereas when the optimization level was set to 4 (the maximum) the generated assembler for the delay1ms macro was:

            MOV       V3, #2500             ; [DPU_3_PIPE] |45| 
            MOVS      A1, #0                ; [DPU_3_PIPE] |45| 
    ;* --------------------------------------------------------------------------*
    ;*   BEGIN LOOP ||$C$L3||
    ;*
    ;*   Loop source line                : 45
    ;*   Loop closing brace source line  : 45
    ;*   Known Minimum Trip Count        : 2500
    ;*   Known Maximum Trip Count        : 2500
    ;*   Known Max Trip Count Factor     : 2500
    ;* --------------------------------------------------------------------------*
    ||$C$L3||:    
            ADDS      A1, A1, #1            ; [DPU_3_PIPE] |45| 
            CMP       V3, A1                ; [DPU_3_PIPE] |45| 
            BGT       ||$C$L3||             ; [DPU_3_PIPE] |45| 
            ; BRANCHCC OCCURS {||$C$L3||}    ; [] |45| 
    ;* --------------------------------------------------------------------------*
            STR       A1, [V1, #4]          ; [DPU_3_PIPE] 
    

    The point I am trying to make is that it is difficult to know how many instructions cycles will be generated for a given set of C statements.

  • Chester Gillon said:
    The point I am trying to make is that it is difficult to know how many instructions cycles will be generated for a given set of C statements

    Even if code timing loops in assembler, due to the use of flash wait states the instruction timing can be slower than expected - see Inconsistent delay with SysCtlDelay() compared to ROM_SysCtlDelay() for details.

  • Beyond any limits imposed by any specific (thus limited) function - there is the timing uncertainty introduced when the MCU's System Clock exceeds 40MHz - and the MCU's flash memory is the code storage medium.

    The IAR compiler - under brief testing thus far - is yielding far faster program execution.

  • Thank you all very much for your input here - I had completely forgotten about Optimisation settings - I always debug with them off, otherwise it can be very confusing when single-stepping. However having now tried my simple delay loop with the compiler Optimisation level at 4 and the Optimise for speed set to 5, I find I need to loop 33000 times to get 1mS delay. This is 13.2 times faster! WOW. Amazing. I come from a PIC assembler background so compiler settings are still a bit mysterious to me. Thank you very much for the support.

    Cheers, Richard