This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

LP-AM243: Running the code from TCM is slower

Part Number: LP-AM243

Hi, I use gpio_interrupt example and I want to measure interrupt latency by SW.

I measure the start point from the interrupt source and the end point at the ISR. 

When I place the whole code in TCM I count a higher number of cycles than using the default linker.cmd where everything is placed in MSRAM.

What can be the reason for that? I expected a lower number when running from TCM than MSRAM? 

Here is the linker.cmd file, thanks!

/* This is the stack that is used by code running within main()
* In case of NORTOS,
* - This means all the code outside of ISR uses this stack
* In case of FreeRTOS
* - This means all the code until vTaskStartScheduler() is called in main()
* uses this stack.
* - After vTaskStartScheduler() each task created in FreeRTOS has its own stack
*/
--stack_size=4096
/* This is the heap size for malloc() API in NORTOS and FreeRTOS
* This is also the heap used by pvPortMalloc in FreeRTOS
*/
--heap_size=4096
-e_vectors /* This is the entry of the application, _vector MUST be plabed starting address 0x0 */

/* This is the size of stack when R5 is in IRQ mode
* In NORTOS,
* - Here interrupt nesting is disabled as of now
* - This is the stack used by ISRs registered as type IRQ
* In FreeRTOS,
* - Here interrupt nesting is enabled
* - This is stack that is used initally when a IRQ is received
* - But then the mode is switched to SVC mode and SVC stack is used for all user ISR callbacks
* - Hence in FreeRTOS, IRQ stack size is less and SVC stack size is more
*/
__IRQ_STACK_SIZE = 1024;
/* This is the size of stack when R5 is in IRQ mode
* - In both NORTOS and FreeRTOS nesting is disabled for FIQ
*/
__FIQ_STACK_SIZE = 256;
__SVC_STACK_SIZE = 4096; /* This is the size of stack when R5 is in SVC mode */
__ABORT_STACK_SIZE = 256; /* This is the size of stack when R5 is in ABORT mode */
__UNDEFINED_STACK_SIZE = 256; /* This is the size of stack when R5 is in UNDEF mode */

SECTIONS
{
/* This has the R5F entry point and vector table, this MUST be at 0x0 */
.vectors:{} palign(8) > R5F_VECS

/* This has the R5F boot code until MPU is enabled, this MUST be at a address < 0x80000000
* i.e this cannot be placed in DDR
*/
GROUP {
.text.hwi: {} palign(8)
.text.cache: palign(8)
.text.mpu: palign(8)
.text.boot: palign(8)
.text:abort: palign(8) /* this helps in loading symbols when using XIP mode */
} > R5F_TCMA

/* This is rest of code. This can be placed in DDR if DDR is available and needed */

.text: {} palign(8) > R5F_TCMA

/* This is rest of initialized data. This can be placed in DDR if DDR is available and needed */
GROUP {
.data: {} palign(8) /* This is where initialized globals and static go */
} > R5F_TCMB0

/* This is rest of uninitialized data. This can be placed in DDR if DDR is available and needed */
GROUP {
.bss: {} palign(8) /* This is where uninitialized globals go */
RUN_START(__BSS_START)
RUN_END(__BSS_END)
.sysmem: {} palign(8) /* This is where the malloc heap goes */
.stack: {} palign(8) /* This is where the main() stack goes */
.rodata: {} palign(8) /* This is where const's go */
} > R5F_TCMB0

/* This is where the stacks for different R5F modes go */
GROUP {
.irqstack: {. = . + __IRQ_STACK_SIZE;} align(8)
RUN_START(__IRQ_STACK_START)
RUN_END(__IRQ_STACK_END)
.fiqstack: {. = . + __FIQ_STACK_SIZE;} align(8)
RUN_START(__FIQ_STACK_START)
RUN_END(__FIQ_STACK_END)
.svcstack: {. = . + __SVC_STACK_SIZE;} align(8)
RUN_START(__SVC_STACK_START)
RUN_END(__SVC_STACK_END)
.abortstack: {. = . + __ABORT_STACK_SIZE;} align(8)
RUN_START(__ABORT_STACK_START)
RUN_END(__ABORT_STACK_END)
.undefinedstack: {. = . + __UNDEFINED_STACK_SIZE;} align(8)
RUN_START(__UNDEFINED_STACK_START)
RUN_END(__UNDEFINED_STACK_END)
} > R5F_TCMB0

/* Sections needed for C++ projects */
// GROUP {
// .ARM.exidx: {} palign(8) /* Needed for C++ exception handling */
// .init_array: {} palign(8) /* Contains function pointers called before main */
// .fini_array: {} palign(8) /* Contains function pointers called after main */
// } > MSRAM

/* General purpose user shared memory, used in some examples */
//.bss.user_shared_mem (NOLOAD) : {} > USER_SHM_MEM
/* this is used when Debug log's to shared memory are enabled, else this is not used */
//.bss.log_shared_mem (NOLOAD) : {} > LOG_SHM_MEM
/* this is used only when IPC RPMessage is enabled, else this is not used */
//.bss.ipc_vring_mem (NOLOAD) : {} > RTOS_NORTOS_IPC_SHM_MEM
/* General purpose non cacheable memory, used in some examples */
//.bss.nocache (NOLOAD) : {} > NON_CACHE_MEM
}

/*
NOTE: Below memory is reserved for DMSC usage
- During Boot till security handoff is complete
0x701E0000 - 0x701FFFFF (128KB)
- After "Security Handoff" is complete (i.e at run time)
0x701F4000 - 0x701FFFFF (48KB)

Security handoff is complete when this message is sent to the DMSC,
TISCI_MSG_SEC_HANDOVER

This should be sent once all cores are loaded and all application
specific firewall calls are setup.
*/

MEMORY
{
R5F_VECS : ORIGIN = 0x00000000 , LENGTH = 0x00000040
R5F_TCMA : ORIGIN = 0x00000040 , LENGTH = 0x00007FC0
R5F_TCMB0 : ORIGIN = 0x41010000 , LENGTH = 0x00008000

/* memory segment used to hold CPU specific non-cached data, MAKE to add a MPU entry to mark this as non-cached */
//NON_CACHE_MEM : ORIGIN = 0x70060000 , LENGTH = 0x8000

/* when using multi-core application's i.e more than one R5F/M4F active, make sure
* this memory does not overlap with other R5F's
*/
MSRAM : ORIGIN = 0x70080000 , LENGTH = 0x40000

/* This section can be used to put XIP section of the application in flash, make sure this does not overlap with
* other CPUs. Also make sure to add a MPU entry for this section and mark it as cached and code executable
*/
//FLASH : ORIGIN = 0x60100000 , LENGTH = 0x80000

/* shared memory segments */
/* On R5F,
* - make sure there is a MPU entry which maps below regions as non-cache
*/
//USER_SHM_MEM : ORIGIN = 0x701D0000, LENGTH = 0x180
//LOG_SHM_MEM : ORIGIN = 0x701D0000 + 0x180, LENGTH = 0x00004000 - 0x180
//RTOS_NORTOS_IPC_SHM_MEM : ORIGIN = 0x701D4000, LENGTH = 0x0000C000
}

  • Hi E.C,

    You're using the GPIO interrupt example from MCU+SDK 08.01? https://software-dl.ti.com/mcu-plus-sdk/esd/AM243X/08_01_00_36/exports/docs/api_guide_am243x/EXAMPLES_DRIVERS_GPIO_INPUT_INTERRUPT.html

    If I take your linker command file and use it with the SDK example, will the reproduce what you're doing?

    Did you check your map file to confirm everything is located in TCM as you expect?

    How are you measuring the start/stop time? Observing the GPIO when it toggles externally (the source), and then what? Driving a GPIO in the ISR and looking at the time difference using a logic analyzer?

    What are the cycle counts for MSMC and TCM cases? How much higher is the cycle count when everything is placed in TCM?

    Regards,
    Frank

  • Hi Frank,

    Yes

    If I take your linker command file and use it with the SDK example, will the reproduce what you're doing?

    I assume yes

    Did you check your map file to confirm everything is located in TCM as you expect?

    Yes

    How are you measuring the start/stop time? Observing the GPIO when it toggles externally (the source), and then what? Driving a GPIO in the ISR and looking at the time difference using a logic analyzer?

    No I don't observe it externally. I measure it that way, a Start timer is always running in while(1)

    while(1)
    {
    //Write to I2C0 SDL and generate an interrupt
    gStart = CycleCounterP_getCount32();
    }

    once I push the button, inside the ISR I measure gEnd:

    static void __attribute__((section(".text.hwi"))) GPIO_bankIsrFxn(void *args)
    {
    gEnd = CycleCounterP_getCount32();

    }

    then simply measuring gEnd-gStart. when it runs on MSRAM I get around 90 cycles, when it runs on TCM I get around 140 cycles.

  • Hi E.C,

    Ok, thanks for the feedback.

    Are you computing min/max/average cycle counts for the MSMC and TCM cases, and do you see any variation in the cycle counts?

    I don't see anything obvious that would cause this from the information you've shared. I'll see if I can reproduce the behavior.

    Regards,
    Frank

  • Are you manually pressing the SW5 button on the EVM to trigger the GPIO interrupt?

  • Hi E.C,

    I slightly modified the gpio interrupt example. This file contains the updates: /cfs-file/__key/communityserver-discussions-components-files/908/5684.gpio_5F00_input_5F00_interrupt.c

    I used the linker command file you shared, but located everything in MSRAM:

    Next I tried the linker command file you shared with everything located in TCM:

    This suggests to me you have some other modifications in the source code. Can you please share you CCS project and the source code?

    Regards,
    Frank

  • Hi Frank, I suppose the problem is that the source file \ overall program is too big for the TCM? maybe try to remove unnecessary things from SysCfg?

    Anyways, below is my source code, I modified it a bit comparing to the normal example since I wanted to measure interrupt by toggle a GPIO. The concept is the same: Measuring from trigger time to ISR execution:

    #include <kernel/dpl/DebugP.h>
    #include <kernel/dpl/ClockP.h>
    #include <kernel/dpl/AddrTranslateP.h>
    #include <kernel/dpl/HwiP.h>
    #include "ti_drivers_config.h"
    #include "ti_drivers_open_close.h"
    #include "ti_board_open_close.h"
    #include "benchmarkdemo.h"

    uint32_t gGpioBaseAddr = GPIO_PUSH_BUTTON_BASE_ADDR;
    HwiP_Object gGpioHwiObject;
    volatile uint32_t gGpioIntrDone = 0;

    static void GPIO_bankIsrFxn(void *args);

    extern void Board_gpioInit(void);
    extern void Board_gpioDeinit(void);
    extern uint32_t Board_getGpioButtonIntrNum(void);
    extern uint32_t Board_getGpioButtonSwitchNum(void);

    #include <kernel/dpl/CycleCounterP.h>

    uint32_t gStart, gEnd, gOverhead;
    uint32_t pinNum_PushButton,pinNum_Test, intrNum;
    void gpio_input_interrupt_main(void *args)
    {
    int32_t retVal;

    uint32_t bankNum, waitCount = 5;
    HwiP_Params hwiPrms;

    /* Open drivers to open the UART driver for console */
    Drivers_open();
    Board_driversOpen();
    Board_gpioInit();

    // DebugP_log("GPIO Input Interrupt Test Started ...\r\n");
    // DebugP_log("GPIO Interrupt Configured for Rising Edge (Button release will trigger interrupt) ...\r\n");

    pinNum_PushButton = GPIO_PUSH_BUTTON_PIN;
    pinNum_Test = TEST_GPIO_PIN;
    intrNum = Board_getGpioButtonIntrNum();
    bankNum = GPIO_GET_BANK_INDEX(pinNum_PushButton);

    /* Address translate */
    gGpioBaseAddr = (uint32_t) AddrTranslateP_getLocalAddr(gGpioBaseAddr);

    /* Setup GPIO for interrupt generation */
    GPIO_setDirMode(gGpioBaseAddr, pinNum_PushButton, GPIO_PUSH_BUTTON_DIR);
    GPIO_setTrigType(gGpioBaseAddr, pinNum_PushButton, GPIO_PUSH_BUTTON_TRIG_TYPE);
    GPIO_bankIntrEnable(gGpioBaseAddr, bankNum);

    /* Register pin interrupt */
    HwiP_Params_init(&hwiPrms);
    hwiPrms.intNum = intrNum;
    hwiPrms.callback = &GPIO_bankIsrFxn;
    hwiPrms.args = (void *) pinNum_PushButton;
    retVal = HwiP_construct(&gGpioHwiObject, &hwiPrms);
    DebugP_assert(retVal == SystemP_SUCCESS );

    CycleCounterP_reset();
    App_timerResetStats();

    /* Calculate overhead */
    gStart = CycleCounterP_getCount32();
    gEnd = CycleCounterP_getCount32();
    gOverhead = gEnd - gStart;

    gStart = CycleCounterP_getCount32();
    GPIO_pinWriteHigh(gGpioBaseAddr, pinNum_PushButton);
    GPIO_pinWriteLow(gGpioBaseAddr, pinNum_PushButton);

    while(1)
    {
    //Write to I2C0 SDL and generate an interrupt
    //gStart = CycleCounterP_getCount32();
    gStart = CycleCounterP_getCount32();
    GPIO_pinWriteHigh(gGpioBaseAddr, pinNum_PushButton);
    gEnd = CycleCounterP_getCount32();
    gOverhead = gEnd - gStart;
    }

    /* Unregister interrupt */
    GPIO_bankIntrDisable(gGpioBaseAddr, bankNum);
    GPIO_setTrigType(gGpioBaseAddr, pinNum_PushButton, GPIO_TRIG_TYPE_NONE);
    GPIO_clearIntrStatus(gGpioBaseAddr, pinNum_PushButton);
    HwiP_destruct(&gGpioHwiObject);

    Board_gpioDeinit();
    Board_driversClose();
    Drivers_close();
    }

    static void __attribute__((section(".text.hwi"))) GPIO_bankIsrFxn(void *args)
    {
    gEnd = CycleCounterP_getCount32();
    GPIO_pinWriteHigh(gGpioBaseAddr, pinNum_Test);
    GPIO_pinWriteLow(gGpioBaseAddr, pinNum_Test);

    uint32_t pinNum = (uint32_t) args;
    uint32_t bankNum = GPIO_GET_BANK_INDEX(pinNum);
    uint32_t intrStatus, pinMask = GPIO_GET_BANK_BIT_MASK(pinNum);

    /* Get and clear bank interrupt status */
    intrStatus = GPIO_getBankIntrStatus(gGpioBaseAddr, bankNum);
    GPIO_clearBankIntrStatus(gGpioBaseAddr, bankNum, intrStatus);

    /* Per pin interrupt handling */
    if(intrStatus & pinMask)
    {
    gGpioIntrDone++;
    }
    }

  • E.C,

    Thanks, I'll take a look.

    Regards,
    Frank

  • Hi E.C,

    I was able to get the code to fit into TCM by removing all the Debug Logs.

    I measured the interrupt latency for four cases:

    1. Debug build, original linker command file
    2. Release build, original linker command file
    3. Your linker command file, code/data linked to MSRAM
    4. Your linker command file, code/data linked to TCM

    I captured 20 measurements in each case.

    These are the min/max/avg measurements for each case:

    Build Min Max Avg
    Debug, org linker cmd  file 249.00 261.00 255.10
    Release, org linker cmd file 146.00 212.00 150.50
    Release, MSRAM linker cmd file 146.00 210.00 150.75
    Release, TCM linker cmd file 134.00 138.00 135.05

    These are stem plots for the last two cases:

    As can be seen, the TCM measurements are lower (~12 cycles on average) than those from the MSRAM case. Also, the initial large measured value for the MSRAM case isn't present for the TCM case. I suspect the large initial value for MSRAM, followed by smaller values, is caused by cache.

    I shared the raw measurement .dat files in the attached file "Captures.zip". I also shared the CCS project and source code in the attached file "gpio_input_interrupt_am243x-evm_r5fss0-0_nortos_ti-arm-clang.zip".

    Please let me know what you think.

    Regards,
    Frank

    /cfs-file/__key/communityserver-discussions-components-files/908/Captures.zip

    /cfs-file/__key/communityserver-discussions-components-files/908/gpio_5F00_input_5F00_interrupt_5F00_am243x_2D00_evm_5F00_r5fss0_2D00_0_5F00_nortos_5F00_ti_2D00_arm_2D00_clang.zip

  • Frank, thank you very much for the very detailed comment.

    It was helpful.

  • E.C,

    No problem, I'm happy to help. I'll close this thread.

    Regards,
    Frank

  • Hi Frank,

    when you were running it, did you get the same values for running from TCM?

    I am getting different values when I check the cycle counts. Isn't the purpose of TCM is being deterministic? or, am I missing something here?

  • Hi E.C,

    For the test code I shared before, I get consistent results when running from TCM. I only see slight variation in the cycle counts from one cycle count snapshot to the next. This is an example run on the LP:

    I've shared the LP example code below.

    I am getting different values when I check the cycle counts.

    What are you seeing? Do you see large variation from capture to capture? Or do you see a large initial capture value followed by smaller, more consistent capture values with small variation?

    Isn't the purpose of TCM is being deterministic?

    Yes, it is supposed to be deterministic.

    Regards,
    Frank

    gpio_input_interrupt_am243x-lp_r5fss0-0_nortos_ti-arm-clang.zip

  • When I run your code, I sometimes see 140, 138, 136, 142... those kind of jumps. it never settles on a fix value

    Thanks

  • Hi E.C,

    I understand your point now. I'll look into this and get back with you.

    Regards,
    Frank

  • E.C,

    Sorry for the long delay on this. I'll investigate further today or tomorrow.

    Regards,
    Frank