This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM625: CPU processing speed

Part Number: AM625
Other Parts Discussed in Thread: SK-AM62

Hi Support Team,

My customer has tested the AM625x processing speed using SK-AM62 as follows,
and the results show that the expected speed is not being achieved.
The customer believes that this is due to the overhead of process switching by Linux,
so could you please provide us with a method to reduce the Linux overhead?

The details of the measurement are as follows.

==================================================================
[Contents of the measurement]
Measuring the execution time of IIR Filter process for signal processing and
comparing the processing speed with another CPU (Arm Coretex-M7 Core 600MHz).

[Measurement environment and method]
-SK-AM62
-The tisdk-default-image built on HostPC was written in emmc and used
according to the procedure of the following site.
dev.ti.com/.../node

-Setting up 1 core by u-boot.
Change cpuinfo_cur_freq and measure at 1400MHz and 200MHz.
Copy the cross-compiled IIR Filer process on HostPC to emmc and execute it.
Measure the execution time using the time command.
5 measurements were taken and the average value was obtained.

-Comparison target:
Implemented on Arm Coretex-M7 Core 600MHz system.
The program is placed on SDRAM (with cache). (with cache)
Measured time by oscilloscope with GPIO output.
5 measurements were taken and average value was obtained.


[Measured results]

Compared to Coretex-M7 600MHz, it took more than 5 times longer even at 1,400MHz.
Probably due to process switching by Linux or other processes.
Note that no change in time was observed even when the system was set to 4-core operation,
the running processes were assigned to 0 to 2 cores, and the measurement process was specified to 3 cores.
==================================================================


If there is any SDK with bare metal or RTOS for A53Core of SK-AM62, please let us know,
as it is simply to measure the execution speed of CPU.

Best Regards,
Kanae

  • In general I'd expect A53 to beat an M7 per clock in an IIR filter. Is the IIR filer single precision floating point?.

    Change cpuinfo_cur_freq and measure at 1400MHz and 200MHz.
    Copy the cross-compiled IIR Filer process on HostPC to emmc and execute it.
    Measure the execution time using the time command.

    Can this IIR filter code be shared? Also what exact command was used to run the process to be benchmarked, was the time reported the "user" row out of the command. This seems like a real-time test, so was PREEMPT_RT Linux used (RT Linux SDK) and when running the process to be benchmarked it should be run at an elevated priority. For example:

    time chrt -f 10 IIR_filter

    This will run a program called IIR_filter at a realtime priority with FIFO scheduling and report the time. As a template example:

    Fullscreen
    1
    2
    3
    4
    5
    6
    root@am64xx-evm:~# time chrt -f 10 bw_mem 8M bcopy
    8.00 996.02
    real 0m 0.61s
    user 0m 0.57s
    sys 0m 0.03s
    root@am64xx-evm:~#
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Here the tests program "bw_mem 8M bcopy" ran in 0.57s, with other things running for 0.03s. This is a way to isolate the throughput of the benchmarked application. With the chrt you then control the latency via prioritization. You can further get more determinism with core isolation, but lets start with the above.

    -Comparison target:
    Implemented on Arm Coretex-M7 Core 600MHz system.
    The program is placed on SDRAM (with cache). (with cache)

    What is the data memory usage of the IIR filter example? I'm assuming in the M7 it is placed in TCM memory?

    Probably due to process switching by Linux or other processes.

    The above guidance and questions is to isolate any scheduling issues. With proper RT Linux usage there should not be a difference in average throughput of a small piece of code vs bare-metal.

    Note that no change in time was observed even when the system was set to 4-core operation,

    A single threaded application, for example one function does not get parallelized in a multicore system. But If you run 4 copies of the test in parallel (in other words 4 threads), they will most likely complete in roughly the same amount of time as one.

      Pekka

  • Hi Pekka,

    Thank you for your support.

    Comments from my customer are as follows.

    Since we are using PROCESSOR-SDK-LINUX-AM62X (Arago-Project) at the time of measurement,
    we will perform the measurement again and confirm it under the conditions you have indicated.

    - Using RT-Linux
    - Using priority specification


    The answers to your questions are as follows.

    - The measurement program is a single-precision result.

    - The command at the time of measurement is as follows.
       root@am62xx-evm:/mnt/sda_1# time . /dsp1

    - The time is the part of "user".

    Best Regards,
    Kanae

  • One more thing. What compiler flags did you use to compile the IIR filter test program, dsp1 ? My suggestion is to use -mcpu=cortex-a53 and -O3.

      Pekka

  • Hi Pekka,

    It was reported by my customer that the SK-AM62 processing speed could be improved.
    Details are as follows.

    RT-Linux, priority, etc. were specified, but there was no significant difference from the previous results,
    so we checked the list file and found that there was no reflection of the flag settings when compiling.

    By correctly applying flag settings when compiling and adding "-mcpu=cortex-a53 -O3",
    the speed was equivalent to that of the M7 Core used for comparison.


    An additional question.
    When RT-Linux SDK was used this time, there is no CPU Clock setting.
    How can I switch CPU Clock in RT-Linux?
    (The above speed measurement was done on PROCESSOR-SDK-LINUX-AM62X
    by changing cpuinfo_cur_freq, but this feature is not enabled on RT-Linux.)

    As for the AM57x example, the following site answered that CPUFREQ is not supported
    in the Linux RT kernel, but is it not supported in the AM62x RT kernel as well?

    https://e2e.ti.com/support/processors-group/processors/f/processors-forum/982794/am5728-how-to-modify-cpu-freq-for-linux-rt

    Best Regards,
    Kanae

  • An additional question.
    When RT-Linux SDK was used this time, there is no CPU Clock setting.
    How can I switch CPU Clock in RT-Linux?
    (The above speed measurement was done on PROCESSOR-SDK-LINUX-AM62X
    by changing cpuinfo_cur_freq, but this feature is not enabled on RT-Linux.)

    As for the AM57x example, the following site answered that CPUFREQ is not supported
    in the Linux RT kernel, but is it not supported in the AM62x RT kernel as well?

    Here are steps to change the A53 clock speed on AM62x. Note 1.4GHz is not supported on the SK power supply, it should be fine for temporary performance tests but not any sort of stability.

    root@am62xx-evm:~#
    root@am62xx-evm:~#
    root@am62xx-evm:~# k3conf --cpuinfo
    |--------------------------------------------------------------------------------|
    | VERSION INFO |
    |--------------------------------------------------------------------------------|
    | K3CONF | (version v0.1-54-g966a569 built Wed May 4 19:08:04 UTC 2022) |
    | SoC | AM62X SR1.0 |
    | SYSFW | ABI: 3.1 (firmware version 0x0008 '8.3.2--v08.03.02 (Jolly Jellyfi)') |
    |--------------------------------------------------------------------------------|

    |--------------------------------------------------------|
    | Processor Name | Processor State | Processor Frequency |
    |--------------------------------------------------------|
    | A53SS0_CORE_0 | DEVICE_STATE_ON | 1200000000 |
    | A53SS0_CORE_1 | DEVICE_STATE_ON | 1200000000 |
    | A53SS0_CORE_2 | DEVICE_STATE_ON | 1200000000 |
    | A53SS0_CORE_3 | DEVICE_STATE_ON | 1200000000 |
    |--------------------------------------------------------|
    ailed root@am62xx-evm:~# k3conf dump clock | grep A53
    autoadjust_table_generic_fprint(): WARNING: "DEV_BOARD0_OBSCLK0_IN_PARENT_SAM62_A53_512KB_WRAP_MAIN_0_ARM_COREPACK_0_A53_DIVH_CLK4_OBSCLK_OUT_CLKCLK_STATE_READY" size (115) > TABLE_MAX_ELT_LEN (100)!
    | 166 | 2 | DEV_A53SS0_A53_DIVH_CLK4_OBSCLK_OUT_CLK | CLK_STATE_READY | 0 |
    | 166 | 3 | DEV_A53SS0_COREPAC_ARM_CLK_CLK | CLK_STATE_READY | 1200000000 |
    | 166 | 5 | DEV_A53SS0_PLL_CTRL_CLK | CLK_STATE_READY | 500000000 |
    | 135 | 0 | DEV_A53SS0_CORE_0_A53_CORE0_ARM_CLK_CLK | CLK_STATE_READY | 1200000000 |
    | 136 | 0 | DEV_A53SS0_CORE_1_A53_CORE1_ARM_CLK_CLK | CLK_STATE_READY | 1200000000 |
    | 137 | 0 | DEV_A53SS0_CORE_2_A53_CORE2_ARM_CLK_CLK | CLK_STATE_READY | 1200000000 |
    | 138 | 0 | DEV_A53SS0_CORE_3_A53_CORE3_ARM_CLK_CLK | CLK_STATE_READY | 1200000000 |
    | 172 | 0 | DEV_A53_RS_BW_LIMITER0_CLK_CLK | CLK_STATE_READY | 250000000 |
    | 173 | 0 | DEV_A53_WS_BW_LIMITER1_CLK_CLK | CLK_STATE_READY | 250000000 |
    | 157 | 99 | DEV_BOARD0_OBSCLK0_IN_PARENT_SAM62_A53_512KB_WRAP_MAIN_0_ARM_COREPACK_0_A53_DIVH_CLK4_OBSCLK_OUT_CLK | CLK_STATE_READY | 0 |
    root@am62xx-evm:~# k3conf set clock 166 3 1400000000
    |--------------------------------------------------------------------------------|
    | VERSION INFO |
    |--------------------------------------------------------------------------------|
    | K3CONF | (version v0.1-54-g966a569 built Wed May 4 19:08:04 UTC 2022) |
    | SoC | AM62X SR1.0 |
    | SYSFW | ABI: 3.1 (firmware version 0x0008 '8.3.2--v08.03.02 (Jolly Jellyfi)') |
    |--------------------------------------------------------------------------------|

    |----------------------------------------------------------------------------------------------------|
    | Device ID | Clock ID | Clock Name | Status | Clock Frequency |
    |----------------------------------------------------------------------------------------------------|
    | 166 | 2 | DEV_A53SS0_A53_DIVH_CLK4_OBSCLK_OUT_CLK | CLK_STATE_READY | 0 |
    | 166 | 3 | DEV_A53SS0_COREPAC_ARM_CLK_CLK | CLK_STATE_READY | 1400000000 |
    | 166 | 5 | DEV_A53SS0_PLL_CTRL_CLK | CLK_STATE_READY | 500000000 |
    |----------------------------------------------------------------------------------------------------|

    root@am62xx-evm:~# k3conf --cpuinfo
    |--------------------------------------------------------------------------------|
    | VERSION INFO |
    |--------------------------------------------------------------------------------|
    | K3CONF | (version v0.1-54-g966a569 built Wed May 4 19:08:04 UTC 2022) |
    | SoC | AM62X SR1.0 |
    | SYSFW | ABI: 3.1 (firmware version 0x0008 '8.3.2--v08.03.02 (Jolly Jellyfi)') |
    |--------------------------------------------------------------------------------|

    |--------------------------------------------------------|
    | Processor Name | Processor State | Processor Frequency |
    |--------------------------------------------------------|
    | A53SS0_CORE_0 | DEVICE_STATE_ON | 1400000000 |
    | A53SS0_CORE_1 | DEVICE_STATE_ON | 1400000000 |
    | A53SS0_CORE_2 | DEVICE_STATE_ON | 1400000000 |
    | A53SS0_CORE_3 | DEVICE_STATE_ON | 1400000000 |
    |--------------------------------------------------------|

    root@am62xx-evm:~#