This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM5748: Performance monitor issue

Part Number: AM5748
Other Parts Discussed in Thread: AM5728

Perf monitoring tool reports a 1 000 000% CPU utilization in Processor SDK Linux 6.0.0.7

Reproduce with the following command:

perf stat -a sleep 5

See the following two screenshots, first is on the 5.3.0.7 release and the second is on 6.0.0.7:

I know TI equipment is great and all, but for some reason I doubt a mere SDK upgrade would unlock an amazing 327 THz CPU on the AM5728_EVM board... ;)

  • Thanks for letting us know about this. I'll pull out my EVM in the next couple of days and see if I can replicate this awesome increase in performance.

  • As an update, I've replicated these results and am working with the development team to ID a fix.

    Thanks.

  • Any updates?

    From my observations I suspect something simple, like omitting to divide actual number of cycles with the time run in microseconds. My first hunch would thus be to look in the printout code for perf. No idea where that is though...

  • Hi, 

    I'm sorry for the delay. We have duplicated this on both TI's kernel and the mainline kernel. We are trying to bisect to find where the error was introduced and debug from there. We are not perf experts and usually just use it as a tool provided by the community. We hope to have some results to share in the next couple of days.

    Thank you.

  • Hi,

    Thank you for that feedback, hoping this gets resolved soon since it triggers several automated tests for us! :) Got a workaround right now though, so no pressure!

  • Hi,

    I spent some time digging into this a bit deeper and was able to trace the change in 'perf' output between the older 05_03_00_07 SDK and our current Linux SDK back to the following commit that was introduced into our current processor SDK via the upstream Kernel:

    0aa802a7946 ("perf stat: Get rid of extra clock display function")

    https://git.ti.com/cgit/processor-sdk/processor-sdk-linux/commit/?h=processor-sdk-linux-4.19.y&id=0aa802a79469a86ebe143019144cd4df8ae852e4

    If you revert this commit on the current SDK's Kernel tree and re-build perf (which is part of the Kernel tree) then you should be able to see the output consistent with what it was before. Below are some example steps how I reverted that commit and re-built perf.

    a0797059@jiji:~/git/linux.alt
    $ git revert 0aa802a79469a86ebe143019144cd4df8ae852e4
    [detached HEAD 1c26ee21033f] Revert "perf stat: Get rid of extra clock display function"
     4 files changed, 49 insertions(+), 21 deletions(-)
    
    a0797059@jiji:~/git/linux.alt
    $ git log
    commit 1c26ee21033f4393d6a33e8e0f7d033a16bcd96c (HEAD)
    Author: Andreas Dannenberg <dannenberg@ti.com>
    Date:   Thu Nov 14 17:06:57 2019 -0600
    
        Revert "perf stat: Get rid of extra clock display function"
    
        This reverts commit 0aa802a79469a86ebe143019144cd4df8ae852e4.
    
    commit 5f8c1c6121da785bbe7ecc5896877a2537b5d6eb
    Merge: fa9e464bd7ad 1b430bd15bc7
    Author: Jacob Stiffler <a0866934@ti.com>
    Date:   Tue Oct 1 10:32:35 2019 -0500
    
        Merge pull request #184 in PROCESSOR-SDK/processor-sdk-linux from increase_ipc_memory to processor-sdk-linux-4.19.y
    
        * commit '1b430bd15bc7612371157d64f11ddffec4e484fe':
          TEMP: arm64: dts: ti: k3-am654-base-board: Increase reserve memory for RTOS IPC
    
    commit 1b430bd15bc7612371157d64f11ddffec4e484fe
    Author: Suman Anna <s-anna@ti.com>
    Date:   Mon Sep 30 13:36:11 2019 -0500
    
        TEMP: arm64: dts: ti: k3-am654-base-board: Increase reserve memory for RTOS IPC
    
        A 1 MB of carveout memory at 0xa2000000 is reserved currently for achieving
        IPC between the two MCU R5F cores when running in split-mode. The PDK IPC
        RTOS code though is currently accessing some memory beyond this 1 MB, used
        to also achieve IPC with the A53 cores running RTOS (Linux uses a separate
        1 MB region per core as part of the overall 16 MB region). Reserve an
        additional 1 MB of memory temporarily until the RTOS code logic is fixed
        up to avoid memory corruptions with the linux kernel.
    
        Signed-off-by: Suman Anna <s-anna@ti.com>
        Signed-off-by: Jacob Stiffler <j-stiffler@ti.com>
    <snip>
    
    a0797059@jiji:~/git/linux.alt
    $ alias makea
    alias makea='make ARCH=arm CROSS_COMPILE="ccache arm-linux-gnueabihf-"'
    
    a0797059@jiji:~/git/linux.alt
    $ cd tools/perf && makea EXTRA_FLAGS="--sysroot=/opt/procsdk-linux-am57xx-evm-06.01.00.08/linux-devkit/sysroots/armv7at2hf-neon-linux-gnueabi" && cd - && sudo cp -v tools/perf/perf /opt/procsdk-linux-am57xx-evm-06.01.00.08/targetNFS/usr/bin/perf
      BUILD:   Doing 'make -j8' parallel build
      HOSTCC   fixdep.o
      HOSTLD   fixdep-in.o
      LINK     fixdep
    <snip>

    Can you please see if you can re-create this doing this single change and check if now the numbers are as expected.

    I'm going to dig a bit deeper into the offending patch itself to see how the change in 'perf' results can be explained.

    Regards, Andreas

  • Hi,

    I can confirm this patch revert does fix it on our end. I wonder if upstream has already fixed this then? Might be worth checking their patches a little.

    Thank you for your hard work, just a little more to go! :)

    Edit: Looking at the patch changes, it seems like they went from millisecond resolution to nanosecond resolution, but seems to have omitted the other parts of perf printouts:

    https://lkml.org/lkml/2018/7/25/846

    More specifically, I think these two lines towards the end of the patch set explains a lot, see if you can walk the tree from there:

    -				     avg / ratio);
    + avg / (ratio * evsel->scale));
  • Sorry for double posting, but this commit is probably of interest, found in the 5.3 branch (2018-11-16):

    https://lore.kernel.org/lkml/tip-57ddf09173c1e7d0511ead8924675c7198e56545@git.kernel.org/

  • user6181252 said:
    Edit: Looking at the patch changes, it seems like they went from millisecond resolution to nanosecond resolution, but seems to have omitted the other parts of perf printouts:

    Thanks for being proactive here and further investigating this in parallel!

    Regards, Andreas

  • Hi user6181252,

    user6181252 said:
    Sorry for double posting, but this commit is probably of interest, found in the 5.3 branch (2018-11-16):

    https://lore.kernel.org/lkml/tip-57ddf09173c1e7d0511ead8924675c7198e56545@git.kernel.org/

    Yes this indeed seems to be the proper fix, as evident from the commit message. This commit however doesn't apply cleanly to our 4.19-based tree. Using 'git blame' I found a related commit eb08d006054e ("perf stat: Use perf_evsel__is_clocki() for clock events") that will make the commit you found apply correctly when cherry-picked. My commit history now looks like this:

    a0797059@jiji:~/git/linux.alt (HEAD detached at 5f8c1c6121da)
    $ git cherry-pick eb08d006054e7e374592068919e32579988602d4
    [detached HEAD da0ff45cffea] perf stat: Use perf_evsel__is_clocki() for clock events
     Author: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
     Date: Thu Nov 15 15:25:32 2018 +0530
     1 file changed, 1 insertion(+), 2 deletions(-)
    
    a0797059@jiji:~/git/linux.alt (HEAD detached from 5f8c1c6121da)
    $ git cherry-pick 57ddf09173c1e7d0511ead8924675c7198e56545
    [detached HEAD 750fdbdc5dfd] perf stat: Fix shadow stats for clock events
     Author: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
     Date: Fri Nov 16 09:58:43 2018 +0530
     1 file changed, 2 insertions(+), 1 deletion(-)
    
    a0797059@jiji:~/git/linux.alt (HEAD detached from 5f8c1c6121da)
    $ git log -3
    commit 750fdbdc5dfd72138ce54eeade4b3838b07377c1 (HEAD)
    Author: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
    Date:   Fri Nov 16 09:58:43 2018 +0530
    
        perf stat: Fix shadow stats for clock events
    
        Commit 0aa802a79469 ("perf stat: Get rid of extra clock display
        function") introduced scale and unit for clock events. Thus,
        perf_stat__update_shadow_stats() now saves scaled values of clock events
        in msecs, instead of original nsecs. But while calculating values of
        shadow stats we still consider clock event values in nsecs. This results
        in a wrong shadow stat values. Ex,
    
          # ./perf stat -e task-clock,cycles ls
            <SNIP>
                      2.60 msec task-clock:u    #    0.877 CPUs utilized
                 2,430,564      cycles:u        # 1215282.000 GHz
    
        Fix this by saving original nsec values for clock events in
        perf_stat__update_shadow_stats(). After patch:
    
          # ./perf stat -e task-clock,cycles ls
            <SNIP>
                      3.14 msec task-clock:u    #    0.839 CPUs utilized
                 3,094,528      cycles:u        #    0.985 GHz
    
        Suggested-by: Jiri Olsa <jolsa@redhat.com>
        Reported-by: Anton Blanchard <anton@samba.org>
        Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
        Reviewed-by: Jiri Olsa <jolsa@kernel.org>
        Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
        Cc: Jin Yao <yao.jin@linux.intel.com>
        Cc: Namhyung Kim <namhyung@kernel.org>
        Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
        Cc: yuzhoujian@didichuxing.com
        Fixes: 0aa802a79469 ("perf stat: Get rid of extra clock display function")
        Link: lkml.kernel.org/.../20181116042843.24067-1-ravi.bangoria@linux.ibm.com
        Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    
    commit da0ff45cffea6447a888651fb446a415d1207b87
    Author: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
    Date:   Thu Nov 15 15:25:32 2018 +0530
    
        perf stat: Use perf_evsel__is_clocki() for clock events
    
        We already have function to check if a given event is either
        SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it.
    
        Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
        Acked-by: Jiri Olsa <jolsa@kernel.org>
        Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
        Cc: Anton Blanchard <anton@samba.org>
        Cc: Jin Yao <yao.jin@linux.intel.com>
        Cc: Namhyung Kim <namhyung@kernel.org>
        Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
        Cc: yuzhoujian@didichuxing.com
        Link: lkml.kernel.org/.../20181115095533.16930-1-ravi.bangoria@linux.ibm.com
        Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    
    commit 5f8c1c6121da785bbe7ecc5896877a2537b5d6eb
    Merge: fa9e464bd7ad 1b430bd15bc7
    Author: Jacob Stiffler <a0866934@ti.com>
    Date:   Tue Oct 1 10:32:35 2019 -0500
    
        Merge pull request #184 in PROCESSOR-SDK/processor-sdk-linux from increase_ipc_memory to processor-sdk-linux-4.19.y
    
        * commit '1b430bd15bc7612371157d64f11ddffec4e484fe':
          TEMP: arm64: dts: ti: k3-am654-base-board: Increase reserve memory for RTOS IPC

    That's a good example of the power of open source - continuous improvement.

    I'm going to submit these patches internally to be included into our next SDK(s).

    Regards, Andreas

  • It is good to finally see this resolved, great detective work! We'll hot-patch this in the meanwhile while waiting for the "proper" fix, and remove our patch once we move to the next SDK. :)