This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM46L852: Why does it take 40 clock cycles to read a register? [Title edited]

Part Number: RM46L852
Other Parts Discussed in Thread: HALCOGEN

My RM46 Launchpad seems to take an awfully long time reading a register.
My measurements indicates that it takes 42 clock cycles to read gioREG->GCR0 and store it in a variable on the stack.

The prime suspect is the compiler/debugger, I suspect some debug feature for register data is slowing things down, but I can't find any settings in the IDE which seems to be a plausible cause

IDE: IAR EWARM FS 7.40 (With setup from HALCoGen) No optimizing.
OS: FreeRTOS
GCLK = 160 MHz.
Method of measurement: Set a pin, read a register and clear the pin again. The duration of the logic high indicates the register read time.

Test code: Two for loops with 150 iterations each (FIFO_DEPTH).
Loop one is a reference which just toggles a pin. This measures how long it takes to toggle a pin.
Loop two is the measurement.

/*************************************************************************************************************/

while (1) {

xTaskNotifyWait( 0x00, 0, &ulNotifiedValue, portMAX_DELAY );
volatile uint32_t temp = 0;
int i;

gioPORTA->DCLR = 1 << FIFO_CHA_READEN;
for (i = 0; i < FIFO_DEPTH; i++) {

gioPORTA->DSET = 1 << FIFO_CHA_CLK;
gioPORTA->DCLR = 1 << FIFO_CHA_CLK;

}
gioPORTA->DSET = 1 << FIFO_CHA_READEN;
gioPORTA->DCLR = 1 << FIFO_CHB_READEN;

for (i = 0; i < FIFO_DEPTH; i++) {

gioPORTA->DSET = 1 << FIFO_CHB_CLK;
temp = gioREG->GCR0;
gioPORTA->DCLR = 1 << FIFO_CHB_CLK;

}
gioPORTA->DSET = 1 << FIFO_CHB_READEN;

}

/*************************************************************************************************************/

The reference loop has a logic high duration of 20 nanoseconds
The measurement loop has a logic high duration of 280 nanoseconds
If this means that the register read takes 260 nanoseconds at a clock speed of 160 MHz, then it took ~42 clock cycles to complete the read and store operation.
Is this normal?
Where do I go from here? Am I right to suspect the compiler/debugger?

/*************************************************************************************************************/

Here's the assembly for loop 2 (the measurement loop)

;gioPORTA->DSET = 1 << FIFO_CHB_CLK;

0xbc9c: 0x2120 MOVS R1, #32 ; 0x20
0xbc9e: 0x4a08 LDR.N R2, [PC, #0x20] ; [0xbcc0] GIOSETA
0xbca0: 0x6011 STR R1, [R2]

;temp = gioREG->GCR0;

0xbca2: 0x4908 LDR.N R1, [PC, #0x20] ; [0xbcc4] 0xfff7bc00 (-541696)
0xbca4: 0x6809 LDR R1, [R1]
0xbca6: 0x9100 STR R1, [SP]

;gioPORTA->DCLR = 1 << FIFO_CHB_CLK;

0xbca8: 0x2120 MOVS R1, #32 ; 0x20
0xbcaa: 0x4a04 LDR.N R2, [PC, #0x10] ; [0xbcbc] 0xfff7bc44 (-541628)
0xbcac: 0x6011 STR R1, [R2]

/*************************************************************************************************************/

  • Here's a cutout from the measurement loop:

    Here's a bird's eye view of both loops. 11.49 us and 48.88 us indicates the run-time of each loop (150 iterations)

    Here's a cutout from reference loop ( I am not sure exactly why it takes longer sometimes, it might be the RTOS. Also, this is not a problem, unless it's symptoms of deeper problems) 

  • Hello,

    It takes about 40 CPU cycles to read one GIO register.
  • Thank you.

    I get the same performance when reading
    temp = rtiREG1->CNT[0].FRCx;
    and
    temp = hetPORT1->DIN;

    1. Does it take ~40 cycles to read any register, or just peripherals?
    2. Is this an generic ARM-device thing or a specific Hercules-device thing?
    3. Is there documentation on this somewhere?
  • Hi Audun,

    I am sorry for my misleading message. It takes about 12VCLKs to read 1 gio register. My test shows that reading gioREG->GCR0 register takes about 25 cpu cycles (1 cpu cyles = 2 vclk cycles),

    This is my test code:

    C code :
    /* Start PMU counter */
    /* PMU calibration */
    _pmuInit_();
    _pmuEnableCountersGlobal_();
    _pmuResetCounters_();
    _pmuStartCounters_(pmuCYCLE_COUNTER);

    for(i=0; i<10; i++){

    delay(1000);
    time0 =_pmuGetCycleCount_();
    _LDR32A10_(0xFFF7BC00); //gioREG->GCR0 + 0x20 (FLG
    time1 =_pmuGetCycleCount_();
    timeA = time1 - time0;

    delay(1000);
    time0 =_pmuGetCycleCount_();
    _LDR32A20_(0xFFF7BC00); //gioREG->GCR0 + 0x20 (FLG
    time1 =_pmuGetCycleCount_();
    timeB = time1 - time0;
    time = timeB - timeA;
    time = time / 10;

    }

    Assemble code to read data from gio register:
    .text
    .arm

    .def _LDR32A20_
    .def _LDR32A10_

    .asmfunc

    _LDR32A10_ ;r0 = source address, r1 = destination

    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0

    bx r14

    .endasmfunc

    .asmfunc
    _LDR32A20_ ;r0 = source address, r1 = destination

    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0

    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0
    ldr r1, [r0], #0

    bx r14

    .endasmfunc
  • Thank you. I will try out the same test as you and verify that I get the same results, but I won't get time to do it until late next week.