This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SYS/BIOS load statistics mismatch on AM335x?

Other Parts Discussed in Thread: AM3359, SYSBIOS

Hello,

I wonder about the CPU load measurements in SYS/BIOS on an AM3359. I have a 10 microseconds DMTimer triggered HWI, and I call the BIOS load functions in a 1 ms clock module triggered SWI, the code is

#include <ti/sysbios/utils/Load.h>

Uint32      cpu_load;
Load_Stat   load_stat_hwi;
Load_Stat   load_stat_swi;
Load_Stat   load_stat_tsk;

Void
clockFxn_1ms()
{

    /* get load statistics */
    cpu_load = Load_getCPULoad();
    Load_getGlobalHwiLoad(&load_stat_hwi);
    Load_getGlobalSwiLoad(&load_stat_swi);
    Load_getTaskLoad(task, &load_stat_tsk);
}

CPU load returns 28 %, HWI load reports in load_stat_hwi about 91630 / 600208 cycles = 15.3 %, load_stat_swi is neglectable, 769 / 600208 = 0.13 %, task load is 0.

For other timer periods (5, 20 microseconds) the reported cpu load is also nearly twice as big as the hwi load. What's the real load? What does the CPU in the time not reported by Load_get*()? Can anyone explain the difference?

Thanks.

The environment is

- Code Composer Studio v 6.0.1.00040 (c:/ti/ccsv6)
- Compiler TI v5.1.9
- Industrial SDK Version 1.1.0.4 (c:/ti/am335x_sysbios_ind_sdk_1.1.0.4)
- SYS/BIOS (c:/ti/bios_6_35_04_50)
- System Analyzer (UIA Target) 1.3.1.08 (loggin is off)
- RTSC (C:\ti\xdctools_3_25_03_72)

BIOS configuration is

var Defaults = xdc.useModule('xdc.runtime.Defaults');
var Diags = xdc.useModule('xdc.runtime.Diags');
var Error = xdc.useModule('xdc.runtime.Error');
var Log = xdc.useModule('xdc.runtime.Log');
var LoggerBuf = xdc.useModule('xdc.runtime.LoggerBuf');
var Main = xdc.useModule('xdc.runtime.Main');
var Memory = xdc.useModule('xdc.runtime.Memory')
var SysMin = xdc.useModule('xdc.runtime.SysMin');
var System = xdc.useModule('xdc.runtime.System');
var Text = xdc.useModule('xdc.runtime.Text');

var BIOS = xdc.useModule('ti.sysbios.BIOS');
var Clock = xdc.useModule('ti.sysbios.knl.Clock');
var Swi = xdc.useModule('ti.sysbios.knl.Swi');
var Task = xdc.useModule('ti.sysbios.knl.Task');
var Semaphore = xdc.useModule('ti.sysbios.knl.Semaphore');
var Hwi = xdc.useModule('ti.sysbios.hal.Hwi');
var Load = xdc.useModule('ti.sysbios.utils.Load');

/*
 * Uncomment this line to globally disable Asserts.
 * All modules inherit the default from the 'Defaults' module.  You
 * can override these defaults on a per-module basis using Module.common$.
 * Disabling Asserts will save code space and improve runtime performance.
Defaults.common$.diags_ASSERT = Diags.ALWAYS_OFF;
 */

/*
 * Uncomment this line to keep module names from being loaded on the target.
 * The module name strings are placed in the .const section. Setting this
 * parameter to false will save space in the .const section.  Error and
 * Assert messages will contain an "unknown module" prefix instead
 * of the actual module name.
Defaults.common$.namedModule = false;
 */

/*
 * Minimize exit handler array in System.  The System module includes
 * an array of functions that are registered with System_atexit() to be
 * called by System_exit().
 */
System.maxAtexitHandlers = 4;       

/*
 * Uncomment this line to disable the Error print function.  
 * We lose error information when this is disabled since the errors are
 * not printed.  Disabling the raiseHook will save some code space if
 * your app is not using System_printf() since the Error_print() function
 * calls System_printf().
Error.raiseHook = null;
 */

/*
 * Uncomment this line to keep Error, Assert, and Log strings from being
 * loaded on the target.  These strings are placed in the .const section.
 * Setting this parameter to false will save space in the .const section.
 * Error, Assert and Log message will print raw ids and args instead of
 * a formatted message.
Text.isLoaded = false;
 */

/*
 * Uncomment this line to disable the output of characters by SysMin
 * when the program exits.  SysMin writes characters to a circular buffer.
 * This buffer can be viewed using the SysMin Output view in ROV.
SysMin.flushAtExit = false;
 */

/*
 * The BIOS module will create the default heap for the system.
 * Specify the size of this default heap.
 */
BIOS.heapSize = 0x1000;

/*
 * Build a custom SYS/BIOS library from sources.
 */
BIOS.libType = BIOS.LibType_Custom;

/* System stack size (used by ISRs and Swis) */
Program.stack = 0x2000;

/* Circular buffer size for System_printf() */
SysMin.bufSize = 0x200;

/*
 * Create and install logger for the whole system
 */
var loggerBufParams = new LoggerBuf.Params();
loggerBufParams.numEntries = 512;
loggerBufParams.instance.name = "logger0";
loggerBufParams.exitFlush = true;
Program.global.logger0 = LoggerBuf.create(loggerBufParams);
Defaults.common$.logger = Program.global.logger0;
Main.common$.diags_INFO = Diags.ALWAYS_ON;

System.SupportProxy = SysMin;

var hwi0Params = new Hwi.Params();
hwi0Params.instance.name = "hwi_synctimer";
hwi0Params.maskSetting = xdc.module("ti.sysbios.interfaces.IHwi").MaskingOption_NONE;
Program.global.hwi_synctimer = Hwi.create(95, "&hwiFxn_synctimer", hwi0Params);
var clock0Params = new Clock.Params();
clock0Params.instance.name = "clock_1ms";
clock0Params.startFlag = true;
clock0Params.period = 1;
Program.global.clock_1ms = Clock.create("&clockFxn_1ms", 1, clock0Params);
Load.hwiEnabled = true;
Load.swiEnabled = true;
Load.windowInMs = 1;
BIOS.cpuFreq.lo = 600000000;
LoggerBuf.enableFlush = true;
Load.taskEnabled = true;
Hwi.dispatcherTaskSupport = true;
Hwi.dispatcherIrpTrackingSupport = true;
Hwi.checkStackFlag = true;

  • Hi,
    The CPU load as returned by Load_getCPULoad is calculated independently from the task, Hwi and Swi loads. What we do is track the percentage of time the CPU is idle(when the idle Task is running) and subtract that from a 100. In summary the CPU load is just the amount of time the CPU isn't in the idle task. It's not a sum of all the other different load measurements. That said you can't really expect the CPU load to increase in close proportion to the Hwi load or any of the others since they're calculated independently. There'll be a correlation as you saw the CPU load increased when the Hwi load increased.

    Let me know if this helps

    Moses
  • Hi Moses,

    thanks for your relpy. It is clear, that if Load_getCPULoad counts the time in the idle loop, and the other measurements are done in the threads of HWI, SWI, TSK,  that there is a SMALL difference between the sum of HWI, SWI and TSK load and 100% - idle time because of the principle of the log. I wonder why the difference is THAT BIG. What does the CPU in the difference: 600000 * 0.28 (not idle) - 91630(HWI) - 769(SWI) = 75601 cycles?

    A new measurement: HWI user function 317 cycles, cpu load 24 %, 70627/600155 = 11.8 % HWI, 782/600155 = 0.13% SWI, with 706 cycles per HWI (100 HWIs with period 10 us in 1 ms).

    The BIOS benchmark, see file:///C:/ti/bios_6_35_04_50/packages/ti/sysbios/benchmarks/doc-files/A8Fg_times.html (in standard installation), states 533 + 273 cycles for HWI prolog + epilog, with 317 user cycles this would yield (533 + 273 + 317) * 100 = 112300 cycles from 600000 = 18.7 % cpu load by HWI, a LITTLE BIT closer to the Load_getCPULoad() reported value of 24%.

    My problem is, that CPU load will be an issue in our application. Which value shall I trust? From all this, my conclusion is to take 100 - Load_getCPULoad() as reliable value for free CPU capacity, and to consider results from Load_getGlobalHwiLoad(&load_stat_hwi); (SWI, TSK) calls as NOT reliable. Not a good news at all. Would you agree with this?

    Frank

    PS: I have another interesting measurement concerning the TimestampProvider_get32() call on the AM3359, details see below. The code runs in main before BIOS_start(). There is an interesting pattern in the first 30 calls: up to 193 cycles are needed every 8th call (for a single instruction in TimestampProvider_get32()!), and generally much longer than in the loop for the next 70 calls (8 cycles). Only this last value comes close to that reported in the Benchmark (6 cycles). Any explanation?

    TestTimestamp Uncertainty A8 (AM3359)

     

    #include <xdc/runtime/Types.h>

    #include <ti/sysbios/family/arm/a8/TimestampProvider.h>

     

    #define L_TIMESTAMPV   100

    unsigned int timestampv[L_TIMESTAMPV];

     

    void

    test_timestamp()

    {

       int     i;

     

       /* get timestamps as fast as possible */

       timestampv[0] = TimestampProvider_get32();

       timestampv[1] = TimestampProvider_get32();

       /* etc */

       timestampv[29] = TimestampProvider_get32();

       /* and in a loop ... */

       for ( i = 30; i < L_TIMESTAMPV; ++i ) {

           timestampv[i] = TimestampProvider_get32();

       }

       /* compute differences */

       for ( i = 0; i < L_TIMESTAMPV - 1; ++i ) {

           timestampv[i] = timestampv[i+1] - timestampv[i];

       }

    }

     

    produces:

     

    timestampv
    198    19     18     178    24
    23     22     21     20     19
    18     193    24     23     22
    21     20     19     18     189
    24     23     22     21     20
    19     18     193    24     23
    40     8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      8
    8      8      8      8      3813206

    single calls: every 8th call is long delayed, pattern in delays (become 1 cycle shorter)

    in loop faster, no delay

     

    compiles to:

     

    205           timestampv[27] = TimestampProvider_get32();

    800015b8:   EB003206 BL             ti_sysbios_family_arm_a8_TimestampProvider_get32__E

    800015bc:   E584006C STR             R0, [R4, #108]

    206           timestampv[28] = TimestampProvider_get32();

    800015c0:   EB003204 BL             ti_sysbios_family_arm_a8_TimestampProvider_get32__E

    800015c4:   E5840070 STR             R0, [R4, #112]

    207           timestampv[29] = TimestampProvider_get32();

    800015c8:   EB003202 BL             ti_sysbios_family_arm_a8_TimestampProvider_get32__E

    209           for ( i = 30; i < L_TIMESTAMPV; ++i ) {

    800015cc:   E3A05046 MOV             R5, #70

    800015d0:   E2846078 ADD             R6, R4, #120

    207          timestampv[29] = TimestampProvider_get32();

    800015d4:   E5840074 STR             R0, [R4, #116]

    210               timestampv[i] = TimestampProvider_get32();

             $C$L1:

    800015d8:   EB0031FE BL             ti_sysbios_family_arm_a8_TimestampProvider_get32__E

    800015dc:   E4860004 STR             R0, [R6], #4

    209           for ( i = 30; i < L_TIMESTAMPV; ++i ) {

    800015e0:   E2555001 SUBS           R5, R5, #1

    213           for ( i = 0; i < L_TIMESTAMPV - 1; ++i ) {

    800015e4:   E3A0C063 MOV            R12, #99

    209           for ( i = 30; i < L_TIMESTAMPV; ++i ) {

    800015e8:   1AFFFFFA BNE             $C$L1

     

  • Hi Frank,
    If your application needs the CPU load then Load_getCPULoad will give you a reliable representation of that. The get Hwi, Swi and Task loads are still reliable when they're used for what they represent - Hwi, Swi and Task loads respectively. Trying to sum them to get the CPU load is not the way to go as there are other factors such as the time the scheduler runs, the overhead for running Tasks, Hwi's and Swi's. Also another big factor not accounted for is the time used for Cache thrashing. The applications we use to run benchmarks are small enough in size that they fit in cache. If you want to feed your curiosity, you can disable cache and see how it affects your results. In summary, I'd say If you need the CPU load use Load_getCPULoad. Calculating it based on the time that the CPU is not in the idle loop covers all these other factors.

    Regards,
    Moses