This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

  • TI Thinks Resolved

TMDXIDK5718: M4 task execution time

Intellectual 485 points

Replies: 9

Views: 194

Part Number: TMDXIDK5718

Hi,

I was trying to calculate the execution time of a task running inside M4 with bare-metal code(No- OS), how I can do that.

I tried with example provided in C:\ti\pdk_am57xx_1_0_15\packages\ti\csl\example\timer\timer_app\main_m4.c  by using timers, but I am getting huge time period values. And I am getting 120 clock ticks for 2 successive TIMERCounterGet calls, why this much of huge values? Is any thing I am doing wrong?

SYS_CLK1 is the clock signal for the timer and in IDK board it's 20 MHz.

Below are the steps I have followed:

1. Started a timer4 with 60 sec timeout in interrupt mode

2. Reading the TIMERCounterGet before and after the task  execution, and calculating the number of clock ticks for executing the task

3. Calculating the time period from the difference in clock ticks,

time period = (clock ticks * 0.05 Micro sec)

As frequency is 20 Mhz, one clock tick will be 0.05 Micro Sec.

main_m4.c

And one more doubt, as M4 is not supporting Hardware FPU's what is the added delay if I use software FPU library in the application execution time. 

Regards,

Naveen.

  • Hi Naveen,

    I think it's OK to use a timer for profiling M4 code. I need to check the M4 bare-metal timer example to see how it operates.

    I have a few question for you:

    • How are you configuring the IPU SS clock? Are you using a GEL file?
    • How are you configuring the IPU SS MMUs (IPUx_UNICACHE_MMU & IPUx_MMU)? Are you relying on CSL startup code, or are you explicitly configuring these MMUs?
    • What do you mean by "task" in a bare-metal (non-OS) context? Are you profiling a function, or a smaller piece of code (e.g. a loop)?
    • Are you profiling code which reads/write peripheral registers?
    • What leads you to conclude the timer counts are too high for the code you're profiling? Do you have an expected upper bound for the timer count?

    I don't have any information on M4 floating-point performance without an FPU. I'll investigate this further internally to see if can locate any data. Does your application require floating-point data? Can you use the DSP extensions for your calculations?

    Regards,
    Frank

  • In reply to Frank Livingston:

    Dear Frank,

    Below are my answers to your questions.

    1. I was using the bare-metal application with CSL timer drivers. I am using CCS to test the code, first connecting to A15 then DDR memory config followed by enabling all cores from scripts tab in CCS, then I am connecting and loading the code onto M4.

    In M4 I was initializing the uart, pinmux conf, module clock. Then I am initializing the Timers and trying to capture the execution time.

                               boardCfg = BOARD_INIT_MODULE_CLOCK |
                                                  BOARD_INIT_PINMUX_CONFIG |
                                                  BOARD_INIT_UART_STDIO |
                                                  BOARD_INIT_UNLOCK_MMR;

                              status = Board_init(boardCfg);

    I was using the Example file as attached in , 6087.main_m4.c

    So I suppose the IPU SS Clock and IPU SS MMU initialization are  done by CCS from GEL script files during initialization. 

    2.  I am not configuring any unicache inside the M4 code, could you please suggest the required changes inside the code.

    3.  I am trying to calculate the execution time of a function.

    4. No, I am not doing any peripheral read/write operations.

    5. Yes, when I calculated time delay b/w 2 successive TIMERCounterGet calls, it's 6 Micro Sec. As M4 is running with 212 Mhz, each instruction will only take 4.7 Nano sec, with in 6 micro sec it can execute almost 1200 instructions,  so why I am reading this much high values?

    The input clock for timer is configured for SYSCLK1 and I suppose it's 20 MHZ external source on the AM5718IDK board.

    Please suggest me what are the changes required to calculate the execution time of a task, we have to take the architectural decision so please reply me asap. 

    Regards,

    Naveen.

  • In reply to Naveen B:

    Dear Frank,

    I am waiting for your reply, have tested the code I have shared with you?

    Regards,

    Naveen.

  • In reply to Naveen B:

    Naveen,

    I've received the code. I'll take a look as soon as possible.

    Regards,
    Frank

  • In reply to Frank Livingston:

    HI Naveen,

    Sorry for the delayed response on this. I've started looking at the example code on M4. Before proceeding, though, I want to make sure this is still a concern for you. Are you still facing this issue?

    Thanks,
    Frank

  • In reply to Frank Livingston:

    Dear Frank,

    After enabling the M4 IPU_UNICACHE the time periods reduced drastically from 210 micro sec to 20 micro, but we are not OK with that time periods. As M4 doesn't support FPU and seeing these time periods, we are thinking how to proceed.further.

    And by enabling MMU will reduce duration any further? or is any other configurations I have missed?

    Regards,

    Naveen.

  • In reply to Naveen B:

    Hi Naveen,

    Are your updated IPU_UNICACHE settings contained in the code you shared? If not, can you please provide an update?

    What is your time period budget?

    Regards,
    Frank

  • In reply to Frank Livingston:

    Dear Frank,

    I enabled the uni cache from CCS,

    Connect to M4 core -> IPU1_UNICAHE_CFG -> CACHE_CONFIG -> BYPASS -> BYPASS_1

    Enabling inside the M4 code will not work I suppose, we have to configure from A15 or boot-loader. If it's possible inside the M4 core, kindly share it the code.

    We want a 10 to 20 micro sec range for doing a some conditional checks, what is the through put of the M4 core.

    Regards,

    Naveen.

  • In reply to Naveen B:

    Naveen,

    This thread concerning UNICACHE configuration for reducing code cycle counts may be of help: https://e2e.ti.com/support/processors/f/791/t/711906

    Regards,
    Frank

This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.