EK-TM4C1294XL: RAMFUNC aka running functions out of RAM is slower than running from Flash

David Whitfield86

Part Number: EK-TM4C1294XL

Typically running functions from RAM is much faster on most MCUs however I created a quick benchmark project which proves that running the same function from RAM is about 25% slower.

see code below:

/******************************************************************************/
/* MISC_NOPs
*
* The function does a certain number of NOPs.
* */
/******************************************************************************/
void MISC_NOPs(unsigned int nops)
{
unsigned int i;

for(i=0;i<nops;i++)
{
NOP();
}
}

#pragma CODE_SECTION(MISC_NOPs_RAM,".TI.ramfunc");
/******************************************************************************/
/* MISC_NOPs_RAM
*
* The function does a certain number of NOPs.
* */
/******************************************************************************/
void MISC_NOPs_RAM(unsigned int nops)
{
unsigned int i;

for(i=0;i<nops;i++)
{
NOP();
}
}

The one from ram takes ~13uS while the one from Flash takes ~10uS.

Linker:

MEMORY
{
/* Application stored in and executes from internal flash */
FLASH (RX) : origin = APP_BASE, length = 0x00100000
/* Application uses internal RAM for data */
SRAM (RWX) : origin = 0x20000000, length = 0x00030000
SRAM_RAMFUNC (RWX) : origin = 0x20030000, length = 0x00010000
}

/* Section allocation in memory */

SECTIONS
{
.intvecs: > APP_BASE
.text : > FLASH
.const : > FLASH
.cinit : > FLASH
.pinit : > FLASH
.init_array : > FLASH

.vtable : > RAM_BASE
.data : > SRAM
.bss : > SRAM
.sysmem : > SRAM
.stack : > SRAM, fill 0xAAAA5555

.TI.ramfunc : {} LOAD = FLASH,
RUN = SRAM_RAMFUNC,
LOAD_START(RamfuncsLoadStart),
LOAD_SIZE(RamfuncsLoadSize),
LOAD_END(RamfuncsLoadEnd),
RUN_START(RamfuncsRunStart),
RUN_SIZE(RamfuncsRunSize),
RUN_END(RamfuncsRunEnd),
PAGE = 0, ALIGN(4)

}

over 7 years ago

0 Bob Crosby over 7 years ago

TI__Guru 72500 points

What level of optimization did you use? Without optimization the index "i" is stored in system RAM and reread each time. These reads and writes compete with the reads of the opcodes when executing from RAM. The flash has a separate bus interface to the CPU so RAM reads and writes can occur at the same time as flash reads. That may explain your results.

0 cb1_mobile over 7 years ago

Guru 117855 points

I was going to comment very similarly to Vendor's Bob - while noting the following two points:

it is recommended that programs always use the Code region (i.e. Flash) because the Cortex-M4F has separate buses that can perform instruction fetches and data accesses simultaneously.
your measurements - yet not your results - are sure to change if you "Cascaded the NOPs" - rather than, "Holding them Hostage" w/in a (so restricted/limited) a loop

As the ARM design has proven superior to (past others) - the performance of, "Flash vs SRAM" - may prove an "advantage" - not a liability...

Arm-based microcontrollers

Arm-based microcontrollers forum

EK-TM4C1294XL: RAMFUNC aka running functions out of RAM is slower than running from Flash