AM67: Linux SDK: Dhrystone benchmark results?

Part Number: AM67

Tool/software:

Team,
I have been looking at our App note "Sitara AM64x /AM243x Benchmarks https://www.ti.com/lit/pdf/spracv1 " that gives for Dhrystone 3 DMIPS/MHz for each core.
The app note tells that for standard ARM core the DMIPS/MHZ will be identical with the same compiler flags.

Now looking at the AM67x Linux SDK 9.02 benchmarks for Dhrystone:
https://software-dl.ti.com/jacinto7/esd/processor-sdk-linux-am67a/09_02_00/exports/docs/devices/J7_Family/linux/Release_Specific_Performance_Guide.html?highlight=ddr

It is strange that the DMIPS/Mhz are comparable for the 3 values marked in color as this are different cores (A53, A72), different number of cores (8, 4 and 2) running at different clk speed.
(even if according to the AppNote the Dhrystone benchmarks are run from L1 memory).
Could you please confirm the value that are published?
Can you clarify why the benchmarks are similar?

Thanks in advance,

Anber

  • Hi,

    Could you please confirm the value that are published?

    Yes these are standard test cases that are run on the farm across various devices. So the numbers are valid.

    Can you clarify why the benchmarks are similar?

    I will let our expert comment on this.

    - Keerthy

  • Hello Anber,

    Scanning the web page, it does look like there is an error in the report.  The J722S (7th in list) should report the value of ~3 (2.9) and it shows 4.4 which is inline with the value for A72s.   The Dhrystone test is a single core only test and it fits in the L1.  When normalized to MHz (assuming it was running at a constant speed) as in the table the main factors in score will be compiler & options, CPU type, and any competing loads.

    The published values should be coming from runs of objects built and ran on SDK full releases.  Its noteworthy that this value is not optimal but represents what can be gotten with a standard build. Higher numbers can come by using different compilers and by removing all CPU competition which can exist in a full system run.  One other bit of confusion is you see some people publish numbers with and without function inlining optimizations.  An nlined value might reach ~>6.5 where a non-inlined is around ~4.5.

    To illustrate I'll attach a few profiles created by ARM ETM (hardware trace).

    This A72 ~4.95 is a GCC with no-inlining on bare metal (noticing all dhry functions are seen Proc_1-x and Func1-x)

    This A72 ~6.1 is with inlining on bare metal (you can see The # of Proc_x and Func_x is reduced as the function have been folded into each other to improve speed by removing call out overhead).

     

    Here is a run of Dhry on Linux.  You can clearly see overhead impacting the CPU usage.  In this case non-dhry activity is mostly sparse interrupts, but if more is running it can disturb the score.  This shows up in some jitter in run to run results. Things like thermal throttling and such can result in unintuitive results.
    Regards,
    Richard W.
  • Hi Richard,
    Thanks for you very precise answer.

    Can we make sure that the benchmarks get corrected in the next version of the SDK doc?
    Should a JIRA ticket be entered?

    Thanks in advance,

    Anber

  • Hello Anber,

    Yes, I created a Jira ticket for this release of the SDK.  Hopefully the next releases will have the J722S results in the matching column.  Its clear colum6 not 7 is where its results exist for several of the tests.  The J722S A53 will underperform the J7-A72 platforms for throughput based benchmarks.  A53 might only be better in some RT-jitter type benchmarks.

    Regards,
    Richard W.